Archive for October, 2011

Real trends in word and sentence length

A couple of days ago, The Telegraph quoted an actor and a television producer emitting typically brainless "Kids Today" plaints about how modern modes of communication, especially Twitter, are degrading the English language, so that "the sentence with more than one clause is a problem for us", and "words are getting shortened". I spent a few minutes fact-checking this foolishness, or at least the word-length bit of it — but some readers may have misinterpreted my post as arguing against the view that there are any on-going changes in English prose style.

Read the rest of this entry »

Comments (72)

Sirte, Texas

According to Ben Zimmer, I'm writing from the front lines. But it's pretty quiet here, sitting at home in Texas, looking at tweets that have come out of Libya in the last couple of weeks. And somehow I don't think I'll be the first twitterologist to suffer from combat fatigue. Maybe that's because my students Joey Frazee and Chris Brown, together with our collaborator Xiong Liu, have been the ones doing computational battle in our little research team. That and the fact that nobody is firing mortars around here.

Yet quiet as it is where I'm sitting, it's a startling fact that today it's easy to hear far off clamor, to listen to the online noise made by thousands of ordinary people. Ordinary people in war zones. What are those people thinking?

Read the rest of this entry »

Comments (10)

On the front lines of Twitter linguistics

I have a piece in today's New York Times Sunday Review section, "Twitterology: A New Science?" In the limited space I had, I tried to give a taste of what research is currently out there using Twitter to build various types of linguistic corpora. Obviously, there's a lot more that could be said about these projects and other fascinating ones currently underway. Herewith a few notes.

Read the rest of this entry »

Comments (14)

Stroke order inputting

Michael Carr writes, "While examining an iPhone dictionary app (KanjiDicPro), I got a laugh from the attached "bǐshùn biānhào' 笔顺编号." [VHM: bǐshùn biānhào' 笔顺编号 means "stroke order serial/code number"]

Read the rest of this entry »

Comments (11)

Where he at now?

That's the question on a t-shirt designed by John Allison,  the author of the Bad Machinëry comic series:

Remember that dude? Always poppin' up in the corner? Wonder what he doin' now? Where he at now?

For those who are too young (or too old, or too fortunate in some other way) to have encountered the Microsoft's Office Assistant "Clippit", nicknamed "Clippy", the Wikipedia page may be helpful.

Read the rest of this entry »

Comments (14)

Up in ur internets, shortening all the words

Lucy Jones, "Ralph Fiennes blames Twitter for 'eroding' language", The Telegraph 10/28/2011:

Speaking at the BFI London Film Festival awards in Old Street, London, the actor said that modern language "is being eroded" and blamed "a world of truncated sentences, soundbites and Twitter."

"Our expressiveness and our ease with some words is being diluted so that the sentence with more than one clause is a problem for us, and the word of more than two syllables is a problem for us," he said.

This sort of thing always makes me suspect that it's really our veracity and our ease with facts that is being diluted.

Read the rest of this entry »

Comments (84)

"Chinglish" hits Broadway

Tonight is the opening night for a new Broadway play called "Chinglish." I first heard "Chinglish" was coming to Broadway from, appropriately enough, Victor Mair, Language Log's resident expert on the tricky Mandarin-English translational divide. At first all I knew about it was the stylized logo for the show, with the title as Ch'ing·lish. (I thought the diacritic in the first syllable might be some sort of homage to the old Wade-Giles romanization of the aspirated voiceless alveopalatal affricate / t͡ɕʰ/ as ch', as in the Ch'ing Dynasty, now pinyinized as q. But I think it's also supposed to evoke the syllabic stress mark used for headwords in English dictionaries, since the syllable break has the conventional dictionary-style centered dot.) When I saw that the play was written by David Henry Hwang, who won a Tony Award for "M. Butterfly," I was hopeful. And now that I've seen the play and had a chance to interview Hwang about it, I can report that there is much about this funny, poignant play for Language Log fans to love.

Read the rest of this entry »

Comments (7)

Lightning strike crash blossom

Josh Fruhlinger sends along a sublime crash blossom from BBC News: "Dog helps lightning strike Redruth mayor." Requisite screenshot in case it changes:

Read the rest of this entry »

Comments (26)


A couple of days ago, Victor Steinbok sent to ADS-L some examples like this one, which he heard on a Canadian TV show:

I had to drive him on account of he lost his license.

Read the rest of this entry »

Comments (23)

Google Reader Salvage Ethnography

From Laine Gates and Dolly Hayde:

One sentence from your recent post on Wernicke's aphasia (" . . . we here at Language Log are committed to taxonomies of nonsense that are as elaborate as possible") made us hopeful that you might be interested in the "salvage ethnography" project we've begun with the Google Reader Lexicon at

See also "Please don't kill our last enlightenment tool", Dust and Trash 2/22/2011.; Sarah Perez, "Iranians Upset Over Google Reader Changes", TechCrunch 2/24/2011:

Comments off

Referent-finding llama

Combining two things from recent postings (linguist llama and referent finding):

(via Ellen Seebacher on Google+).

Comments off

Amy was found dead in his apartment

I'm spending three days in Tampa at the kick-off meeting for  DARPA's new BOLT program. Today was Language Sciences Day, and among many other events, there was a "Semantics Panel", in which a half a dozen luminaries discussed ways that the analysis of meaning might play a role again in machine translation. The "again" part comes up because, as Kevin Knight observed in starting the panel off, natural language processing and artificial intelligence went through a bitter divorce 20 years ago. ("And", Gene Charniak added, "I haven't spoken to myself since.")

The various panelists had somewhat different ideas about what to do, and the question period uncovered a substantially larger range of opinions represented in the audience. But it occurred to me that there's a simple and fairly superficial kind of semantic analysis that is not used in any of the MT systems that I'm familiar with, to their considerable detriment — despite the fact that algorithms with decent performance on this task have been around for many years.

Read the rest of this entry »

Comments (15)

Linguist Llama

Comments (59)