Archive for Language and technology

Google Demotes Literary Stars

My post about Google's metadata problems, along with a similar piece in the Chronicle of Higher Education, got a lot of people talking about the problem in the press and the blogs. (I even ran into an allusion to it in a La Repubblica piece on the Google Book Settlement when I arrived in Rome yesterday morning.) A number of people passed along their own experiences with flaky metadata. Others criticized me on grounds that could be broadly summed up as "Don't look a gift horse in the server," "It's better than nothing," "Who needs metadata anyway?," "Just give them time," and "Why concentrate on trivialities like metadata while ignoring the real perils of corporate monopoly" (as in "serving as a consultant for monitoring the proper temperatures of the pitchforks in hell").

This is all to the good, if it helps move up the metadata issues in Google's queue. I do think this will get a lot better as Google puts its considerable mind to it. But there was one other aspect of the metadata problem which I hadn't noticed or even thought about, but which in its own small way was unkindest cut of all. It was noticed by the children's book author Ace Bauer, who was prompted by my account of the metadata problems to check his Google Books listing:

Turns out my review rating ranked only one star out of 5. That's dim. But see, the review upon which they based this ranking was Kirkus's. Kirkus loved the book. They gave it a star. One star. That's all they give folks. It's considered a major honor.

Indeed it is, and actually the falling-star glitch affects a number of writers, for example Roy Blount, Jr., the president of the Author's Guild, who is has been an enthusiastic backer of the settlement. Google Books assigns a one-out-of-five star rating to at least two of Blount's books on the basis of their starred Kirkus reviews, Crackers and First Hubby, and visits similar review rating downgrades on books by Guild vice-president Judy Blume and Guild board members Nick LemannJames GlieckOscar Hijuelos, among others.

 I don't know exactly what the Google people will say when they cotton to this one, but it's a good guess the first sentence will begin with "oy."

Read the rest of this entry »

Comments (11)

NLTK Book on Sale Now

The NLTK book, Natural Language Processing with Python, went on sale yesterday:

Cover of Natural Language Processing with Python

"This book is here to help you get your job done." I love that line (from the preface). It captures the spirit of the book. Right from the start, readers/users get to do advanced things with large corpora, including information-rich visualizations and sophisticated theory implementation. If you've started to see that your research would benefit from some computational power, but you have limited (or no) programming experience, don't despair — install NLTK and its data sets (it's a snap), then work through this book.

Read the rest of this entry »

Comments (5)

Chinese Typewriter

This (the machine invented by the famous Chinese author, Lin Yutang, and described on the first page [first four paragraphs] of the Wikipedia article here) is probably the closest the Chinese ever got to decomposing their script into an "alphabet" consisting of "letters" (recurrent graphemic elements that can be combined in a principled way to form all of the characters / morphemes in their writing system).  You'll note that it didn't really work during their presentation to the Remington Typewriter Company executives.  The press conference demonstration they had the next day was probably of the carefully rehearsed, staged, orchestrated sort designers of Chinese information processing / technology software and hardware often present (the kind documented by Li-ching Chang in her film made at a vocational high school in Beijing), not one prepared to respond spontaneously to tasks posed by the audience.  Judging from my own experience with Chinese software and information processing / technology developers over more than a quarter of a century, this may have been what went wrong when Lin presented his typewriter to the Remington executives:  they asked him (or his operator) to type something impromptu.  Incidentally, the development of this fatally flawed typing machine left Lin — whose books were bestsellers in America — bankrupt.

Read the rest of this entry »

Comments (46)

Experiencing language death

Usarufa speakers experience the webUsarufa is a language of Papua New Guinea with just 1200 speakers (ISO-639 code "usa").  There's no fluent speakers under the age of 25, so the language must be considered moribund.  Before posting recordings of this language online, I needed to get informed consent, so I introduced some speakers to the World Wide Web.  We poked around for a while, finding useful sites about about insecticides for dealing with the taro beetle.  Then we turned our attention to audio.

I played them a recording of the "last words" of the Jiwarli language of Western Australia.  After some questioning looks I explained that this language is now dead, and we were listening to its last speaker before he died.  As one they all looked down, shaking their heads in disbelief and saying sorry, sorry, sorry….  It was as if I told them a mutual friend had died.  They urged me to put that recording on a cassette tape so they could take it back to their village.  That way, everyone would surely understand what will happen to the Usarufa language unless there are serious attempts to revitalize it.

I wasn't prepared for the intensity of their response.  Now I'm wondering if a collection of such recordings might be a useful tool in promoting language revitalization, and also in explaining the concept of language archiving.  (Thanks to Ima'o Ta'asata, James Warebu, Sivini Ikilele, and Waks Mark for their dedication to the preservation of Usarufa oral culture, and to Aaron Willems and SIL-PNG for facilitating this work.)

Comments (29)

Rhymes with "black" and sounds like "Alabama"

You'd think it was the end of the world. Apparently, the Nuance Communications-powered text-to-speech system on the new Amazon Kindle mispronounces Barack Obama's name, saying something like "buh-RACK oh-BAM-uh" instead of "buh-ROCK oh-BAH-muh". Why is this little tidbit worth a piece in the business/media section of The New York Times? The answer is, it's not. It could have been an OK lead-in to a technology piece about how text-to-speech systems work, and how they can fail — often spectacularly — on unknown words, especially names. Granted, adding the (pronunciation of the) name of a political figure such as Barack Obama to the system's dictionary is a simple enough thing to do (which is how Nuance will in fact fix the problem, if it hasn't already), and it was clearly an oversight worth pointing out to the company. But then again, the version of Firefox I'm using right now (3.0.4 for the Mac) has been underlining both of the President's names in what I have been typing thus far, incorrectly guessing that I'm misspelling something, and I'll bet you won't see some NYT reporter wasting their time on such a triviality.

Read the rest of this entry »

Comments (54)

A Limitation on Names in the PRC

Anyone who looked at the front page of the New York Times today probably noticed the article by Sharon LaFraniere entitled "Your Name's Not on Our List?  Change It, Beijing Officials Say." Featured in the article is a young woman named Ma Cheng, whose surname Ma is written with the character for "horse" and whose given name Cheng is written with a very rare character composed of three horses lined up closely in a row:  馬馬馬 (the latter character is exceedingly difficult to write in a small square exactly the same size as the space allotted to one horse [and to all other characters, even if they have as many as 64 strokes]!).  The article states that this character pronounced Cheng is not to be found among the 32,252 characters in the Chinese government's computer systems, so Ms. Ma has been told peremptorily that she must change her name.

Read the rest of this entry »

Comments (42)

Why you shouldn't use spell checkers

An incident yesterday at Brigham Young University, the leading academic outpost of the Church of Jesus Christ of Latter Day Saints, provides yet another example of the pitfalls of using spelling correctors. In yesterday's Daily Universe, the student newspaper, a photograph of the Quorum of the Twelve Apostles, the second highest body in the Mormon church, was mistakenly captioned "Quorum of the Twelve Apostates". The error is attributed to a spell checker that did not recognize the word "apostle" and suggested "apostate" as a substitute, a suggestion mistakenly accepted by the editor.

Of course, if English had a decent writing system there would be no use for such software and one less source for errors.

Comments (70)

Oh no, it's ngmoco:)

Apple previewed iPhone OS 3.0 earlier this week, and they conveniently posted a video of the event on their website. I was grateful to be able to watch the video, mostly because I wanted to hear how the folks at Apple pronounce the name of the iPhone-centric game designing firm ngmoco:).

Read the rest of this entry »

Comments (28)

Cupertino Creep hits DC GOP

When I was interviewed for Spiegel Online earlier this week about the dastardly Cupertino effect, I was asked if I thought spellchecker-enabled miscorrections would eventually vanish as spellchecking technology becomes more accurate in predicting potential errors. I said I thought Cupertinos would continue to be with us in one form or another, in large part because of the proper name problem: a reasonably restrictive spellchecker dictionary can never encompass all the proper names that might appear in a given text, particularly unusual foreign names. Consider the old Obama/Osama tangle: after 9/11, Osama was added to Microsoft's spellchecker dictionary, but at the time no one could have predicted that Obama would also be an important name to include. Thus they had to scramble to add Obama when he rose to prominence and spellcheckers were giving Osama as the first suggestion.

Now, as if on cue, the District of Columbia Republican Committee kindly illustrates my point in a new press release.

Read the rest of this entry »

Comments (35)

Der Cupertino-Effekt

Spiegel Online, Germany's biggest news website and a sister publication of the weekly Der Spiegel, has just run an article on one of our favorite topics: the Cupertino effect, the phenomenon whereby automated spellcheckers miscorrect words and inattentive users accept those miscorrections. (See my primer on OUPblog as well as our ongoing coverage on both the old and new Language Log.) I was interviewed for the piece, which was written by Konrad Lischka for his column on everyday things that do not work (Fehlfunktion, or 'malfunction'). Though I don't read German, the article looks pretty solid. I especially like the German Cupertinos that are provided, based on spellchecker suggestions in German Mac Word 2008. For instance, Barack Obama prompts the suggestion Barock Obama (barock means 'baroque'), while Stinger-Rakete ('Stinger missile') prompts Stinker-Rakte ('stinker missile'). Looks like a job for the intrepid Microsoft Office Natural Language Team, Teutonic division.

Comments (20)

English and Science in China and Japan

Yesterday I had the opportunity for an eye-opening talk with a man who for 20 years has been the director of a world-renowned biochemistry and physiology research institute.  His job frequently takes him to key labs in China and Japan, and he always has scores of Chinese, Japanese, and Korean staff scientists and postdocs working in his own labs.  Here are some of the mind-boggling things the director told me:

Read the rest of this entry »

Comments (61)

Pause, on, off, whatever: human interface design

In the lecture room where I will be giving a talk later today at the Max-Planck Institute for Psycholinguistics, the audiovisual equipment is controlled by a small touch-screen unit. Right now, the part of the display that controls the ceiling-mounted projector looks like this:

ON OFF
PAUSE

That is almost exactly what it looks like. Now, you tell me: would that mean that the projector is on, or that it is off? Is the blue button the operative one, showing the name of the current state? Or is it the white button beside it that we should pay attention to? (I should make it clear that the PAUSE across below them is not a button: only the ON and the OFF buttons change color when touched.) And then once we have decided whether we should see this as saying "ON" or as saying "OFF", do you think it means that the pausing function is on, which would mean that the projector is off? Or that the pausing function is off, which would mean that the projector is on?

Read the rest of this entry »

Comments (87)

Global Voice Translator

What? You haven't heard of the Pomegranate phone? It's "[t]he ultimate all-in-one device", going "where no phone has gone before". It's amazing. I want one, even more than I want an iPhone (and I want one of those pretty bad, so you can just imagine).

The Pomegranate's niftiest feature is probably the Global Voice Translator, illustrated here:

(I say "probably" because the niftiest feature is really the coffee brewer, but this is Language Log, so I had to go with the GVT.)

[ Hat-tip: Andy Kehler. ]

Comments (25)