Archive for Language and technology
January 23, 2012 @ 3:36 am· Filed by Geoffrey K. Pullum under Language and technology, Logic, Lost in translation, Nerdview, Semantics
In the Hotel Ciutat de Tarragona, the beautiful modern hotel in Tarragona where I am currently staying, I ate breakfast in the 1st-floor restaurant (Americans: that would be the 2nd floor), and then came out to take the elevator back up to my 5th-floor room (Americans: 6 floors up). But I was baffled: there was no button to call the elevator for upward journeys. There was just a button labeled with the Down-Arrow symbol for calling the elevator to go back down to the lobby on level 0. Some sort of security, I assumed, to ensure that random restaurant patrons don't go up in the elevator to wander up and down the halls looking for unlocked doors or stealable items. But then how was I to get back up to my room? I'm ashamed to report just how long it took me to resolve the conundrum here. Perhaps you would like to solve it for yourself before you read on.
Read the rest of this entry »
Permalink
December 11, 2011 @ 12:58 pm· Filed by Geoffrey K. Pullum under Idioms, Language and technology, Language change
Throughout my whole life it has been the standard British English metaphor for Sisyphean tasks, the jobs that are endless because by the time you get to the end you need to start over: It's like painting the Forth Bridge.
It is legendary that after finishing the magnificent rail bridge over the Firth of Forth north-west of Edinburgh in 1890 they started repainting it, and a hundred years later they were still at it. Every time they painted their way to the far end, which took years, the paint had worn off where they had started, and they had to go back over there and begin again immediately.
But there was a new development this week: they finally finished the job, and stopped. Now the simile's future looks bleak.
Read the rest of this entry »
Permalink
October 30, 2011 @ 6:27 pm· Filed by Ben Zimmer under Computational linguistics, Language and technology, Language on the internets
I have a piece in today's New York Times Sunday Review section, "Twitterology: A New Science?" In the limited space I had, I tried to give a taste of what research is currently out there using Twitter to build various types of linguistic corpora. Obviously, there's a lot more that could be said about these projects and other fascinating ones currently underway. Herewith a few notes.
Read the rest of this entry »
Permalink
October 30, 2011 @ 4:29 pm· Filed by Victor Mair under Language and technology, Writing systems
Michael Carr writes, "While examining an iPhone dictionary app (KanjiDicPro), I got a laugh from the attached "bǐshùn biānhào' 笔顺编号." [VHM: bǐshùn biānhào' 笔顺编号 means "stroke order serial/code number"]
Read the rest of this entry »
Permalink
October 3, 2011 @ 12:43 pm· Filed by Geoffrey K. Pullum under Language and technology, Silliness
Language Log readers may be wondering why there has been no coverage of the achievement of Jesse Anderson, who has managed to get millions of monkeys, as computationally simulated on Amazon servers, to reproduce 99.9 percent of the works of Shakespeare (his own account is here on his blog, and various journalistic sheep have obediently reproduced his account in the newspapers). I'll tell you why.
Read the rest of this entry »
Permalink
September 21, 2011 @ 1:07 pm· Filed by Ben Zimmer under Language and technology, Linguistic history, Linguistics in the comics, Writing systems
In a great use of comic art, Roy Boney Jr. has created a graphic feature for the magazine Indian Country Today about the history of the Cherokee syllabary developed by Sequoyah in the early 19th century. Boney begins with the syllabary's inception and early use, and continues all the way through technological developments like the Selectric typewriter and Unicode standardization. Check it out here.

Permalink
September 2, 2011 @ 10:44 am· Filed by Victor Mair under Information technology, Language and technology, Names, Writing systems
Under the above rubric, my friend Apollo Wu sent around a note (copied below) about the economic impact of the use of Chinese characters in the operation of his business. Since Apollo was for many years (from 1973 to 1998) a top translator in the Chinese Translation Service at United Nations headquarters in New York, he knows whereof he speaks. Among other interesting tidbits that I heard from Apollo over the decades was that, of the official languages of the United Nations (Arabic, Mandarin Chinese, English, French, Russian, and Castilian Spanish) Chinese was by far the least efficient and most expensive to process.
Read the rest of this entry »
Permalink
August 12, 2011 @ 3:08 am· Filed by Geoffrey K. Pullum under Information technology, Language and technology
We neglected to mention this while the relevant cartoon was the current one at xkcd, but a couple of days ago there was a nice analysis of why through 20 years of effort, we've successfully trained everyone to use passwords that are hard for humans to remember but easy for computers to guess. Check it out. The observation seems correct: if you try it out on one of the web interfaces that assess the strength of your password as you choose it, you'll find that a word with a few letters replaced by miscellaneous digits and so on, like Ne8r@$k@, gets high marks but grizzle snip grunt mackerel doesn't (and probably won't be accepted beyond the first 8 to 12 characters). Yet if you mutter "grizzle snip grunt mackerel" under your breath once, you'll find you remember it all day, even without using it. And length is your main security. The example the cartoon gives contrasts a 3-day brute-force cracking time (for about 28 bits of entropy) with a 550-year time (for about 44).
[Comments are closed unless you have a password. If you have forgotten your password, click here.]
Permalink
August 4, 2011 @ 9:28 pm· Filed by Ben Zimmer under Headlinese, Language and technology, Nerdview, Words words words
Fans of noun piles will enjoy the recent blog post by Mike Pope, a technical editor at Microsoft, "Fun (or not) with noun stacks." Mike shares a few of the lovely compound noun pileups he's encountered on the job:
- data bound control table row action links
- failed password security question answer attempts limit
- reduced minimum OS partition space available requirement
Mike goes on to explain why he thinks these problematic constructions continue to crop up in technical writing, driven by imperatives of terseness and concision at the expense of comprehensibility. He also gives helpful advice for untangling technical noun piles into something more user-friendly. That's all well and good, but you have to wonder just how deeply enmeshed in nerdview a writer must be to produce a whopper like "failed password security question answer attempts limit."
Permalink
July 8, 2011 @ 10:34 am· Filed by Philip Resnik under Language and technology, Lost in translation
Looking at Geoff's post on machine-translated phishing scam messages, the message certainly does come across as very similar to the English output we in the biz frequently see coming out of statistical machine translation of Chinese. This includes Chinese-specific issues like recovering correct determiners from a language that does not express them overtly (I hope that the [not this] letter meets you in good spirits), as well as the ubiquitous phenomenon of sentences that are locally coherent — thanks to phrase-level translations and good statistical language-models for English — but globally nonsensical. I don't claim to know what makes a text poetic, but it seems to me that this combination of local coherence and larger-scale disconnectedness must be at least partly responsible for what Geoff describes as the "strange poetry" of machine translationese.
Read the rest of this entry »
Permalink
July 7, 2011 @ 4:10 pm· Filed by Geoffrey K. Pullum under Errors, Language and technology, Lost in translation
You know what I think is happening? This is just too insane not to be true. I believe Hong Kong script kiddies wanting to try Nigerian-style thieving of bank account details are actually using Google Translate to translate their phishing messages from Chinese into English. Below the fold I quote in full (obscuring my address with x's to outwit the spam robots) a wildly, asyntactically unintelligible phishing spam which I received today. It's unintendedly hilarious — you could try reading it aloud at parties. And it's so garbled and implausible that I can't believe even poor naive Aunt Mildred will be suckered. Interestingly, it shows clear signs of being the output of very bad corpus-based translation, unsupervised and unchecked. My suspicion of Chinese provenance was based not just on the .hk (Hong Kong) address, but also on the fact that the spammer thinks an English-speaking PhD named Dr. Roller Key would refer to himself as Dr. Roller — that is, the Chinese syntax for personal names is being assumed.
Read the rest of this entry »
Permalink
June 29, 2011 @ 11:55 pm· Filed by Geoffrey K. Pullum under Language and technology
I guess I had not really foreseen how fast the advent of ebooks would lead to a gigantic, unstoppable tsunami of what can only be described as bookspam, available for sale at Amazon.com. Have a look at this article by John Naughton, about the results of Amazon making available an easy conversion to Kindle format and easy uploading for sale.
Read the rest of this entry »
Permalink
April 17, 2011 @ 4:33 pm· Filed by Victor Mair under Language and technology
On June 30, 2009, I wrote a post entitled "Chinese Typewriter". It's time now to do an update, because on March 9, 2011, I travelled to the University of Kansas to deliver the Wallace Johnson Memorial Lecture. So what do Wallace Johnson and the University of Kansas have to do with Chinese typewriters? It's simply that Wallace Johnson is the only Westerner I know who became proficient in the use of the kind of Chinese typewriter I wrote about in my 2009 post, and he happened to teach Chinese history at the University of Kansas from 1965 to 2007. I knew Wally Johnson because of his interest in Tang period law and because he received his Ph.D. from the University of Pennsylvania under Derk Bodde, who was a good friend of mine.
Read the rest of this entry »
Permalink