Language Log

Justin Bieber Brings Natural Language Processing to the Masses

November 18, 2011 @ 10:12 pm· Filed by Philip Resnik under Computational linguistics, Language and culture

Forget Watson. Forget Siri. Forget even Twitterology in the New York Times (though make sure to read Ben Zimmer's article first). You know natural language processing has really hit the big time when it's featured in a story in Entertainment Weekly.

Read the rest of this entry »

Permalink Comments (5)

Speech-based "lie detection"? I don't think so

November 10, 2011 @ 3:09 pm· Filed by Mark Liberman under Computational linguistics

Mike Paluska, "Investigator: Herman Cain innocent of sexual advances", CBS Atlanta, 11/10/2011:

Private investigator TJ Ward said presidential hopeful Herman Cain was not lying at a news conference on Tuesday in Phoenix.

Cain denied making any sexual actions towards Sharon Bialek and vowed to take a polygraph test if necessary to prove his innocence.

Cain has not taken a polygraph but Ward said he does have software that does something better.

Ward said the $15,000 software can detect lies in people's voices.

This amazingly breathless and credulous report doesn't even bother to tell us what the brand name of the software is, and certainly doesn't give us anything but Mr. Ward's unsupported (and in my opinion almost certainly false) assertion about how well it works:

Ward said the technology is a scientific measure that law enforcement use as a tool to tell when someone is lying and that it has a 95 percent success rate.

Read the rest of this entry »

Permalink Comments (19)

Real trends in word and sentence length

October 31, 2011 @ 8:34 am· Filed by Mark Liberman under Computational linguistics, Linguistic history

A couple of days ago, The Telegraph quoted an actor and a television producer emitting typically brainless "Kids Today" plaints about how modern modes of communication, especially Twitter, are degrading the English language, so that "the sentence with more than one clause is a problem for us", and "words are getting shortened". I spent a few minutes fact-checking this foolishness, or at least the word-length bit of it — but some readers may have misinterpreted my post as arguing against the view that there are any on-going changes in English prose style.

Read the rest of this entry »

Permalink Comments (72)

Sirte, Texas

October 31, 2011 @ 1:23 am· Filed by David Beaver under Computational linguistics, Language and politics, Linguistics in the news

According to Ben Zimmer, I'm writing from the front lines. But it's pretty quiet here, sitting at home in Texas, looking at tweets that have come out of Libya in the last couple of weeks. And somehow I don't think I'll be the first twitterologist to suffer from combat fatigue. Maybe that's because my students Joey Frazee and Chris Brown, together with our collaborator Xiong Liu, have been the ones doing computational battle in our little research team. That and the fact that nobody is firing mortars around here.

Yet quiet as it is where I'm sitting, it's a startling fact that today it's easy to hear far off clamor, to listen to the online noise made by thousands of ordinary people. Ordinary people in war zones. What are those people thinking?

Read the rest of this entry »

Permalink Comments (10)

On the front lines of Twitter linguistics

October 30, 2011 @ 6:27 pm· Filed by Ben Zimmer under Computational linguistics, Language and technology, Language on the internets

I have a piece in today's New York Times Sunday Review section, "Twitterology: A New Science?" In the limited space I had, I tried to give a taste of what research is currently out there using Twitter to build various types of linguistic corpora. Obviously, there's a lot more that could be said about these projects and other fascinating ones currently underway. Herewith a few notes.

Read the rest of this entry »

Permalink Comments (14)

Where he at now?

October 30, 2011 @ 9:23 am· Filed by Mark Liberman under Computational linguistics, Linguistics in the comics

That's the question on a t-shirt designed by John Allison, the author of the Bad Machinëry comic series:

Remember that dude? Always poppin' up in the corner? Wonder what he doin' now? Where he at now?

For those who are too young (or too old, or too fortunate in some other way) to have encountered the Microsoft's Office Assistant "Clippit", nicknamed "Clippy", the Wikipedia page may be helpful.

Read the rest of this entry »

Permalink Comments (14)

Amy was found dead in his apartment

October 26, 2011 @ 5:15 pm· Filed by Mark Liberman under Computational linguistics, Pragmatics, Semantics

I'm spending three days in Tampa at the kick-off meeting for DARPA's new BOLT program. Today was Language Sciences Day, and among many other events, there was a "Semantics Panel", in which a half a dozen luminaries discussed ways that the analysis of meaning might play a role again in machine translation. The "again" part comes up because, as Kevin Knight observed in starting the panel off, natural language processing and artificial intelligence went through a bitter divorce 20 years ago. ("And", Gene Charniak added, "I haven't spoken to myself since.")

The various panelists had somewhat different ideas about what to do, and the question period uncovered a substantially larger range of opinions represented in the audience. But it occurred to me that there's a simple and fairly superficial kind of semantic analysis that is not used in any of the MT systems that I'm familiar with, to their considerable detriment — despite the fact that algorithms with decent performance on this task have been around for many years.

Read the rest of this entry »

Permalink Comments (15)

Replicating the snuckward trend

October 17, 2011 @ 5:13 am· Filed by Mark Liberman under Computational linguistics, Linguistic history

In yesterday's post "Deceptively valuable", I made use of counts from the Google Books ngram dataset, as seen through Mark Davies' convenient interface. That was a case where the ngram dataset's flaws (uncertain metadata, lack of ability to look at context, etc.) are more than balanced by its virtues. In thinking about some of the other issues involved, I remembered a case that makes it possible to check the ngram dataset's answers against those given by another historical collection: the trend over the past century for Americans to replace "sneaked" with "snuck".

Read the rest of this entry »

Permalink Comments (15)

Deceptively valuable

October 16, 2011 @ 9:51 am· Filed by Mark Liberman under Computational linguistics, Pragmatics, Variation

A couple of weeks ago, Eric Baković posted about phrases of the form deceptively <ADJECTIVE>, and gave the results of an online survey of more than 1500 LL readers ("Watching the deceptive", 10/2/2011), who were each asked to interpret one of two phrases:

The exam was deceptively easy.		The exam was deceptively hard
The exam was easy.	56.8%	The exam was easy.	11.8%
The exam was hard.	36.0%	The exam was hard.	84.0%
The exam was neither.	7.2%	The exam was neither.	4.2%

Eric suggested that this variability in judgments, and also the asymmetry between easy and hard, might be connected to the phenomenon of misnegation. And there were many other interesting observations and speculations in Eric's post and the 64 comments on it. But a simple tally of collocational frequency for the word deceptively suggests a couple of relevant factors that neither Eric nor any of the commenters noticed.

Read the rest of this entry »

Permalink Comments (28)

Sirimania

October 14, 2011 @ 5:50 am· Filed by Mark Liberman under Changing times, Computational linguistics, Linguistics in the comics

Yesterday's Doonesbury joins the parade of praise for Siri:

Read the rest of this entry »

Permalink Comments (10)

Political voices

October 12, 2011 @ 2:45 pm· Filed by Mark Liberman under Computational linguistics, Prosody

Like other regular readers of Andrew Sullivan's web log, I was not surprised that he was happy about Sarah Palin's decision not to run for U.S. president in 2012. However, one aspect of his commentary ("Rejoice!", 10/5/2011) did surprise me. The puzzle is in the second sentence:

Our Three Year National Nightmare Is Over!

Palin talks to Mark Levin here (her voice is the deeper one).

Mark Levin is a radio talk show host, and Sullivan's link goes to a page on Levin's web site that includes not only the text of Palin's statement, but also accesses an mp3 file of a 15-minute segment of his show. My interest here, of course, is not in the politics but in the phonetics. Is it really true that Sarah Palin's voice is deeper (i.e. lower in pitch) than Mark Levin's?

Read the rest of this entry »

Permalink Comments (18)

Raising his voice

October 8, 2011 @ 10:46 am· Filed by Mark Liberman under Computational linguistics, Language and politics, Prosody

FDR had his weekly "Fireside chats", and in 1982 Ronald Reagan began the modern tradition of weekly presidential addresses, which U.S. presidents since then have maintained. I don't think that very many people actually listen to these things — no one that I've asked has ever admitted to regular consumption. But I've been collecting them since 2004, and listening to most of them, and a few days ago I noticed something.

Read the rest of this entry »

Permalink Comments (12)

MAGE pHTS

October 5, 2011 @ 4:32 pm· Filed by Mark Liberman under Computational linguistics

A cool demo video from the MAGE pHTS project:

MAGE/pHTS, a real-time speech synthesis library from NUMEDIART on Vimeo.

Read the rest of this entry »

Permalink Comments (2)

Archive for Computational linguistics

Justin Bieber Brings Natural Language Processing to the Masses

Speech-based "lie detection"? I don't think so

Real trends in word and sentence length

Sirte, Texas

On the front lines of Twitter linguistics

Where he at now?

Amy was found dead in his apartment

Replicating the snuckward trend

Deceptively valuable

Sirimania

Political voices

Raising his voice

MAGE pHTS

Follow us on Twitter

Archives [+/–]

Blogroll [+/–]

Meta