Archive for Information technology

A virus that fixes your grammar

In today's Dilbert strip, Dilbert is confused by why the company mission statement looks so different, and Alice diagnoses what's happened: the Elbonian virus that has been corrupting the company's computer systems has fixed all the grammar and punctuation errors it formerly contained.

That'll be the day. Right now, computational linguists with an unlimited budget (and unlimited help from Elbonian programmers) would be unable to develop a trustworthy program that could proactively fix grammar and punctuation errors in written English prose. We simply don't know enough. The "grammar checking" programs built into word processors like Microsoft Word are dire, even risible, catching only a limited list of shibboleths and being wrong about many of them. Flagging split infinitives, passives, and random colloquialisms as if they were all errors is not much help to you, especially when many sequences are flagged falsely. Following all of Word's suggestions for changes would creat gibberish. Free-standing tools like Grammarly are similarly hopeless. They merely read and note possible "errors", leaving you to make corrections. They couldn't possibly be modified into programs that would proactively correct your prose. Take the editing error in this passage, which Rodney Huddleston recently noticed in a quality newspaper, The Australian:

There has been no glimmer of light from the Palestinian Authority since the Oslo Accords were signed, just the usual intransigence that even the wider Arab world may be tiring of. Yet the West, the EU, nor the UN, have never made the PA pay a price for its intransigence.

Read the rest of this entry »

Comments off

Is there a practical limit to how much can fit in Unicode?

A lengthy, important article by Michael Erard recently appeared in the New York Times Magazine:

"How the Appetite for Emojis Complicates the Effort to Standardize the World’s Alphabets:  Do the volunteers behind Unicode, whose mission is to bring all human languages into the digital sphere, have enough bandwidth to deal with emojis too?" (10/18/17)

The article brought back many vivid memories.  It reminded me of my old friend, Joe Becker, who was the seminal designer of the phenomenal Xerox Star's multilingual capabilities in the mid-80s and instrumental in the organization and foundation of the Unicode Consortium in the late 80s and early 90s.  Indeed, it was Becker who coined the word "Unicode" to designate the project.

Read the rest of this entry »

Comments (34)

Easy versus exact

Ever since people started inputting Chinese characters in computers, I've had an intense interest in how they do it, which systems are more efficient, and why they choose the particular ones they adopt.  For the first few decades, because all inputting systems presented significant obstacles and challenges, I remained pretty much of an onlooker because I didn't want to waste my time struggling with cumbersome methods.  It's only after I discovered how simple and fast it is to use Google Translate as my chief inputting method that I became very active in entering Chinese character texts.

Read the rest of this entry »

Comments (31)

Awesome / sugoi すごい!

Comments (7)

Information content of text in English and Chinese

Terms and concepts related to "letters" and "characters" were used at spectacularly crossed purposes in many of the comments on Victor Mair's recent post "Twitter length restrictions in English, Chinese, Japanese, and Korean". I'm not going to intervene in the tangled substance of that discussion, except to reference some long-ago LLOG posts on the relative information content of different languages/writing systems. The point of those posts was to abstract away from the varied, complex, and (here) irrelevant details of character sets, orthographic conventions, and digital encoding systems, and to look instead at the size ratios of parallel (translated) texts in compressed form. The idea is that compression schemes try precisely to get rid of those irrelevant details, leaving a better estimate of the actual information content.

My conclusions from those exercises are two:

  1. The differences among languages in information-theoretic efficiency appear to be quite small.
  2. The direction of the differences is unclear — it depends on the texts chosen, the direction of translation, and the method of compression used.

See "One world, how many bytes?", 8/5/2005; "Comparing communication efficiency across languages", 4/4/2008; "Mailbag: comparative communication efficiency", 4/5/2008; "Is English more efficient than Chinese after all?", 4/28/2008.

 

Comments (7)

Veggies for cats and dogs

This video was passed on by Tim Leonard, who remarks, "real-time video translation at its best":

Read the rest of this entry »

Comments (8)

More Sinological suffering

[This is a guest post by Brendan O'Kane. See "Sinological suffering", 3/31/17, for background.]


I snapped this picture at the library today:

Read the rest of this entry »

Comments (28)

Siri and flatulence

An acquaintance of mine has a new iPhone, which he carries in a pocket that is (relevantly) below waist level. He has discovered something that dramatically illustrates the difference between (i) responding to speech and (ii) responding to speech as humans do, on the basis of knowing that it is speech.

Read the rest of this entry »

Comments off

The miracle of reading and writing Chinese characters

We have the testimony of a colleague whose ability to write Chinese characters has been adversely affected by her not being able to visualize them in her mind's eye.  See:

"Aphantasia — absence of the mind's eye" (3/24/17)

This prompts me to ponder:  just how do people who are literate in Chinese characters recall them?

Read the rest of this entry »

Comments (26)

Pick a word, any word

To access an article in the Financial Times yesterday I found myself confronted with a short market-research survey about laptops, tablets, and smartphones. Answer three our four layers of click-the-box questions, and I could get free access to the article I wanted to look at. A reasonable bargain: clearly some company was prepared to pay the FT for access to its online readers' opinions. And at the fourth layer down I faced a question which asked me to choose a single word that comes into my mind when I think of a certain Microsoft product.

My choice, from all the tens of thousands of words at my disposal, and the word I picked would go straight into the market research department of the one corporation, above all others, for whose products I have the greatest degree of contempt. Just choose that one evocative word and type it in, and I would be through to my article. A free choice. Which word to pick?

Read the rest of this entry »

Comments off

Kazakhstan HQ for the Buffett Foundation

I received an exciting email this afternoon from Perry Alexis, the chief accountant for the Warren Buffett Foundation. It seems I have been picked to receive a $1,500,000 donation — not a grant for research or anything, but a donation. And I notice it came from an email address in Kazakhstan.

Read the rest of this entry »

Comments off

Kindly do the needful

A phishing spam I received today from "Europe Trade" (it claims to be in Wisconsin but its address domain is in Belarus) said this:

Good Day sir/madam,

I am forwarding the attached document to you as instructed for confirmation,

Please kindly do the needful and revert

Best regards
Sarah Griffith

There were two attachments, allegedly called "BL-document.pdf" and "Invoice.pdf"; they were identical. Their icons said they were PDF files of size 21KB (everyone trusts PDF), but viewing them in Outlook caused Word Online to open them, whereupon they claimed to be password-protected PDF files of a different size, 635KB. However, the link I was supposed to click to open them actually led to a misleadingly named HTML file, which doubtless would have sucked me down to hell or sent all my savings to Belarus or whatever. I don't know what you would have done (some folks are more gullible than others), but I decided I would not kindly do the needful, or even revert. Sorry, Sarah.

Read the rest of this entry »

Comments off

Clueless Microsoft language processing

A rather poetic and imaginative abstract I received in my email this morning (it's about a talk on computational aids for composers), contains the following sentence:

We will metaphorically drop in on Wolfgang composing at home in the morning, at an orchestra rehearsal in the afternoon, and find him unwinding in the evening playing a spot of the new game Piano Hero which is (in my fictional narrative) all the rage in the Viennese coffee shops.

There's nothing wrong with the sentence. What makes me bring it to your notice is the extraordinary modification that my Microsoft mail system performed on it. I wonder if you can see the part of the message that it felt it should mess with, in a vain and unwanted effort at helping me do my job more efficiently?

Read the rest of this entry »

Comments off