Archive for December, 2010

Baby talk

Many parents are oblivious to the nuances of their children's paralinguistic vocalizations. But not Aubrey Chorde, from the most recent Something Positive, who is interpreting here for her friend and house-guest Davan MacIntyre.

Some pre-verbal sound-meaning correspondences are universal — crying and laughing, for example. Some more subtle differences, like empty-stomach crying and full-diaper crying, are (I think) interpretable to some extent across children.  But just like adults, very young children also develop idiosyncratic cries, laughs, grunts, giggles, and so on. And their lack of self-censorship makes these especially useful sources of information about their internal lives (and external but out-of-sight activities).

Read the rest of this entry »

Comments (8)

Beginning a new feature: Fine writing from all over

It's a curious paradox of the arithmetic of modern life. New creations — from films and books to paintings and plates — flourish via the absurdly simple creative equation: A + B = C. But if a creator is himself a chimera, a sum of a few parts, the same math doesn’t compute. Take James Franco, whose multifarious career paths seem to puzzle the most supposedly wide-open minds.

David Coleman, "A Turquoise Link to Willie Nelson," New York Times 12/19/10

I'm at a loss for a snapper here.

Comments (33)

Inversion of scalar surprise

Comments (50)

Word lens

Competing with Culturomics for meme room today is Word Lens, which has a great YouTube ad:

Read the rest of this entry »

Comments (26)

More on "culturomics"

The "culturomics" paper that Geoff Nunberg posted about is getting a lot of well-deserved kudos.  Jean Véronis writes

When I was a student at the end of the 1970's, I never dared imagine, even in my wildest dreams, that the scientific community would one day have the means of analyzing computerized corpuses of texts of several hundreds of billions of words.

I've contributed my voice to the chorus — Robert Lee Holtz in the Wall Street Journal ("New Google Database Puts Centuries of Cultural Trends in Reach of Linguists", WSJ 12/17/2010) quotes me this way:

"We can see patterns in space, time and cultural context, on a scale a million times greater than in the past," said Mark Liberman, a computational linguist at the University of Pennsylvania, who wasn't involved in the project. "Everywhere you focus these new instruments, you see interesting patterns."

And I meant every word of that. But there's a worm in the bouquet of roses.

Read the rest of this entry »

Comments (31)

Humanities research with the Google Books corpus

In Science today, there's yesterday, there was an article called "Quantitative analysis of culture using millions of digitized books" [subscription required] by at least twelve authors (eleven individuals, plus "the Google Books team"), which reports on some exercises in quantitative research performed on what is by far the largest corpus ever assembled for humanities and social science research. Culled from the Google Books collection, it contains more than 5 million books published between 1800 and 2000 — at a rough estimate, 4 percent of all the books ever published — of which two-thirds are in English and the others distributed among French, German, Spanish, Chinese, Russian, and Hebrew. (The English corpus alone contains some 360 billion words, dwarfing better structured data collections like the corpora of historical and contemporary American English at BYU, which top out at a paltry 400 million words each.)

I have an article on the project appearing in tomorrow's in today's Chronicle of Higher Education, which I'll link to here, and in later posts Ben or Mark will probably be addressing some of the particular studies, like the estimates of English vocabulary size, as well as the wider implications of the enterprise. For now, some highlights:

Read the rest of this entry »

Comments (58)

Obituary for Fred Jelinek at Computational Linguistics

Back on September 15, when I posted the news of Fred Jelinek's death, I promised to say more when I'd had a chance to think about it. Then, a few days later, Robert Dale asked me to write an obituary for Fred to be published in the Computational Linguistics journal. The December 2010 issue is now out, and Fred's obituary is here.

Following Robert's suggestion, I aimed at a broad assessment of Fred's impact on the field, since CL recently published Fred's own detailed account of his professional life ("The Dawn of Statistical ASR and MT", CL 35(4):483-494 , 2009).

Comments (13)

Help! I'm trapped in a ???

For the past couple of weeks, I've been getting a bunch of curious email messages that start like these:

Thank you for contacting the comics and features department at The Washington Post.  Even though this is an automatic reply to inform you that we have received your comment, we still want you to know that we read every comment individually.

Thank you for contacting the Death Notices Advertising Department of the Washington Post and allowing us to serve you.  Your email has been received.  Listed below you will find general and required information that you may find useful.

Every day I get four or five similar acknowledgments from the comics department or the death notices department over at the Washington Post, although I've never sent any messages to either entity, or to any other WaPo address.

Read the rest of this entry »

Comments (10)

'The' culture war

As we've discussed from time to time, some English proper names take a definite article ("the Times", "the Bronx") and others don't ("Language Log", "Brooklyn"). The public transport system in Boston is called "the T"; the public transport system in Philadelphia is called "SEPTA".

But sometimes, the same name for the same (in some sense) entity gets a definite article in one speech community, and not in another. Apparently people in the Los Angeles area generally use definite articles with freeway numbers ("the 101", "the 405"), although people elsewhere in the U.S. generally don't. (See Language Hat, "'The' + Freeway", 8/1/2010, for some discussion and scholarly references.)

Yesterday, JC Dill sent in the picture on the right, along with an interesting sociolinguistic commentary:

As you may know, there's a war of definite articles between San Francisco (SF Bay Area aka SFBA) and Los Angeles (SoCal).  In the SF Bay Area we talk about taking 101 to San Jose, in SoCal they talk about taking the 101 to Ventura.

So it was with some surprise that I saw the Bank of America (formerly Bank of Italy, a SF company) ad in a MUNI bus stop today.  Clearly this company has lost their SF roots.

Read the rest of this entry »

Comments (130)

Comprehend this!

Perhaps the most illiterate phishing spam yet: ignoring the incompetence of having Velez Restrepo as the sender, jg_van88 (at a Chinese address) as the reply-to, and Mr(.) John Galvan as the alleged sender, with the X-Accept-Language set to Spanish, this message has at least 20 linguistic errors in the text, which is roughly one for each four words.

From gvelez@une.net.co
Wed Dec 15 11:11:57 2010
Date: Wed, 15 Dec 2010 03:11:43 -0800
From: velez restrepo guillermo <gvelez@une.net.co>
Subject: Comprehend This Proposal
Bcc:
Reply-to: jg_van88@w.cn
X-Mailer: Sun Java(tm) System Messenger Express 7.3-11.01 64bit (built Sep 1 2009)
X-Accept-Language: es
Priority: normal

Good day,

I am Mr John Galvan a staff of a private offshore AIG Private bank united kingdom.

I have a great proposal that we interest and benefit you, this proposal of mine is worth of £15,500,000.00 Million Pounds.I intend to give Four thy Percent of the total funds as compensation for your assistance. I will notify you on the full transaction on receipt of your response if interested, and I shall send you the details.

Kind Regards,
Mr. John Galvan

Read the rest of this entry »

Comments (72)

Words and things

Comments (46)

Language and Thought at the Economist

A new motion is open for debate today in the Economist's online series: "This house believes that the language we speak shapes how we think".  Lera Boroditsky is the designated defender of the motion, and I was recruited to be the designated opponent.

In this format, each side submits an opening statement, a rebuttal, and a closing statement. Readers get to comment, and also to vote on the motion. Our opening statements are now live.

Read the rest of this entry »

Comments (70)

Disintermediating the dustbin

Saturday's Dilbert:

Digital media offer wonderful opportunities for the study of language, communication, and culture. So despite short-term problems, both internal and external, I'm optimistic.

Comments (4)