Archive for Language and technology

Hyperbolic lots

For the past couple of years, Google has provided automatic captioning for all YouTube videos, using a speech-recognition system similar to the one that creates transcriptions for Google Voice messages. It's certainly a boon to the deaf and hearing-impaired. But as with Google's other ventures in natural language processing (notably Google Translate), this is imperfect technology that is gradually becoming less imperfect over time. In the meantime, however, the imperfections can be quite entertaining.

Read the rest of this entry »

Comments (9)

Tasty cupertinos

A correction from The New York Times on Damon Darlin's article, "Economic Theory Plots a Course for Good Food" (4/10/12 online, p. D3 in the 4/11/12 print edition):

This article has been revised to reflect the following correction:

Correction: April 10, 2012

An earlier version of this article incorrectly referred to the Ethiopian dish doro wot as door wot. Additionally, the article referred incorrectly to awaze tibs as aware ties.

As noted on the Slate Twitter feed, these goofs are almost certainly the result of overzealous autocorrect — or, as we say in these parts, they're due to the Cupertino effect. We've documented many such cupertinos over the years (old site, new site). Foreign food terms have cropped up before — way back in 2005, before we even knew the Cupertino effect had a name, I noted that menus and recipes had fallen prey to the unfortunate spellcheck miscorrection of prostitute for prosciutto. At least prosciutto is likely to be in spellcheck dictionaries these days — the same can't be said for Ethiopian doro wot or awaze tibs, no matter how delectable those dishes may be.

(Craig Silverman of Poynter's Regret the Error is also on the case.)

Comments (7)

Annals of airport Chinglish, part 3

Carley De Rosa spotted this sign in the Kunming airport on her way to Laos. Dumbfounded by the Chinglish, not least because what it called an "elevator" was actually an "escalator", on her way back from Laos she made sure to get a photograph of the sign and send it to me for analysis:

Read the rest of this entry »

Comments (59)

Puzzled in Tarragona

In the Hotel Ciutat de Tarragona, the beautiful modern hotel in Tarragona where I am currently staying, I ate breakfast in the 1st-floor restaurant (Americans: that would be the 2nd floor), and then came out to take the elevator back up to my 5th-floor room (Americans: 6 floors up). But I was baffled: there was no button to call the elevator for upward journeys. There was just a button labeled with the Down-Arrow symbol for calling the elevator to go back down to the lobby on level 0. Some sort of security, I assumed, to ensure that random restaurant patrons don't go up in the elevator to wander up and down the halls looking for unlocked doors or stealable items. But then how was I to get back up to my room? I'm ashamed to report just how long it took me to resolve the conundrum here. Perhaps you would like to solve it for yourself before you read on.

Read the rest of this entry »

Comments off

Death of a simile

Throughout my whole life it has been the standard British English metaphor for Sisyphean tasks, the jobs that are endless because by the time you get to the end you need to start over: It's like painting the Forth Bridge.

It is legendary that after finishing the magnificent rail bridge over the Firth of Forth north-west of Edinburgh in 1890 they started repainting it, and a hundred years later they were still at it. Every time they painted their way to the far end, which took years, the paint had worn off where they had started, and they had to go back over there and begin again immediately.

But there was a new development this week: they finally finished the job, and stopped. Now the simile's future looks bleak.

Read the rest of this entry »

Comments off

On the front lines of Twitter linguistics

I have a piece in today's New York Times Sunday Review section, "Twitterology: A New Science?" In the limited space I had, I tried to give a taste of what research is currently out there using Twitter to build various types of linguistic corpora. Obviously, there's a lot more that could be said about these projects and other fascinating ones currently underway. Herewith a few notes.

Read the rest of this entry »

Comments (14)

Stroke order inputting

Michael Carr writes, "While examining an iPhone dictionary app (KanjiDicPro), I got a laugh from the attached "bǐshùn biānhào' 笔顺编号." [VHM: bǐshùn biānhào' 笔顺编号 means "stroke order serial/code number"]

Read the rest of this entry »

Comments (11)

A few million monkeys (yawn)

Language Log readers may be wondering why there has been no coverage of the achievement of Jesse Anderson, who has managed to get millions of monkeys, as computationally simulated on Amazon servers, to reproduce 99.9 percent of the works of Shakespeare (his own account is here on his blog, and various journalistic sheep have obediently reproduced his account in the newspapers). I'll tell you why.

Read the rest of this entry »

Comments off

Sequoyah's syllabary, from parchment to iPad

In a great use of comic art, Roy Boney Jr. has created a graphic feature for the magazine Indian Country Today about the history of the Cherokee syllabary developed by Sequoyah in the early 19th century. Boney begins with the syllabary's inception and early use, and continues all the way through technological developments like the Selectric typewriter and Unicode standardization. Check it out here.

Comments (6)

The economics of Chinese character usage

Under the above rubric, my friend Apollo Wu sent around a note (copied below) about the economic impact of the use of Chinese characters in the operation of his business.  Since Apollo was for many years (from 1973 to 1998) a top translator in the Chinese Translation Service at United Nations headquarters in New York, he knows whereof he speaks.  Among other interesting tidbits that I heard from Apollo over the decades was that, of the official languages of the United Nations (Arabic, Mandarin Chinese, English, French, Russian, and Castilian Spanish) Chinese was by far the least efficient and most expensive to process.

Read the rest of this entry »

Comments (21)

Password strength

We neglected to mention this while the relevant cartoon was the current one at xkcd, but a couple of days ago there was a nice analysis of why through 20 years of effort, we've successfully trained everyone to use passwords that are hard for humans to remember but easy for computers to guess. Check it out. The observation seems correct: if you try it out on one of the web interfaces that assess the strength of your password as you choose it, you'll find that a word with a few letters replaced by miscellaneous digits and so on, like Ne8r@$k@, gets high marks but grizzle snip grunt mackerel doesn't (and probably won't be accepted beyond the first 8 to 12 characters). Yet if you mutter "grizzle snip grunt mackerel" under your breath once, you'll find you remember it all day, even without using it. And length is your main security. The example the cartoon gives contrasts a 3-day brute-force cracking time (for about 28 bits of entropy) with a 550-year time (for about 44).

[Comments are closed unless you have a password. If you have forgotten your password, click here.]

Comments off

Microsoft tech writing noun pile blog post madness!

Fans of noun piles will enjoy the recent blog post by Mike Pope, a technical editor at Microsoft, "Fun (or not) with noun stacks." Mike shares a few of the lovely compound noun pileups he's encountered on the job:

  • data bound control table row action links
  • failed password security question answer attempts limit
  • reduced minimum OS partition space available requirement

Mike goes on to explain why he thinks these problematic constructions continue to crop up in technical writing, driven by imperatives of terseness and concision at the expense of comprehensibility. He also gives helpful advice for untangling technical noun piles into something more user-friendly. That's all well and good, but you have to wonder just how deeply enmeshed in nerdview a writer must be to produce a whopper like "failed password security question answer attempts limit."

Comments (42)

Translationese

Looking at Geoff's post on machine-translated phishing scam messages, the message certainly does come across as very similar to the English output we in the biz frequently see coming out of statistical machine translation of Chinese. This includes Chinese-specific issues like recovering correct determiners from a language that does not express them overtly (I hope that the [not this] letter meets you in good spirits), as well as the ubiquitous phenomenon of sentences that are locally coherent — thanks to phrase-level translations and good statistical language-models for English — but globally nonsensical. I don't claim to know what makes a text poetic, but it seems to me that this combination of local coherence and larger-scale disconnectedness must be at least partly responsible for what Geoff describes as the "strange poetry" of machine translationese.

Read the rest of this entry »

Comments (16)