Language Log

Annals of LID

July 19, 2015 @ 9:39 am· Filed by Mark Liberman under Computational linguistics, Language and politics

Nice fucking try, Twitter: pic.twitter.com/ityPbk4hLy

— Jim Henley (@UOJim) July 18, 2015

Read the rest of this entry »

Permalink Comments (9)

Marriage O'Quality

May 23, 2015 @ 7:49 am· Filed by Mark Liberman under Computational linguistics

Tweeted by Graeme Orr:

Marriage O'Quality. Comhghairdeas Éire! #marriageeqaulity

— Graeme Orr (@Graeme_Orr) May 23, 2015

Read the rest of this entry »

Permalink Comments (8)

OK Google

May 23, 2015 @ 3:47 am· Filed by Mark Liberman under Computational linguistics

A couple of days ago, I gave a talk at the Centre Cournot on the topic "Why Human Language Technology (almost) works" ("Pourquoi les technologies de la langue et du discours marchent enfin (ou presque)"), and for the introduction, I tried giving Google Now a few questions and instructions on my Android phone.

In case you're not familiar with this feature, you start it up by saying "OK Google", followed by the question you want it to answer or the instruction you want it to follow.

And since the starting-point of my talk was that HLT now actually works well enough to be useful, I was glad to see that my little experiment worked pretty well.

Read the rest of this entry »

Permalink Comments (14)

Modeling repetitive behavior

May 15, 2015 @ 12:09 am· Filed by Mark Liberman under Biology of language, Computational linguistics

A recent conversation with Didier Demolin about animal vocalizations motivated me to return to a an issue discussed in "Finch linguistics", 7/15/2011. (See also "Markov's heart of darkness", 7/18/2011, "Non-Markovian yawp", 9/18/2011, and "The long get longer", 12/4/2013.)

The point is this: In modeling the structure of simple repetitive behavior, considerations from (traditional) formal language theory can obscure rather than clarify the issues. These threats to insight include the levels of the Chomsky-Schützenberger hierarchy, the "recursion" controversy, and so on.

What follows is an attempt at a simple illustrated explanation.

Read the rest of this entry »

Permalink Comments (8)

Early Alzheimer's signs in Reagan's speech

April 12, 2015 @ 6:52 pm· Filed by Mark Liberman under Computational linguistics, Psychology of language

Lawrence Altman, "Parsing Ronald Reagan’s Words for Early Signs of Alzheimer’s", NYT 3/30/2015:

Even before Ronald Reagan became the oldest elected president, his mental state was a political issue. His adversaries often suggested his penchant for contradictory statements, forgetting names and seeming absent-mindedness could be linked to dementia.

In 1980, Mr. Reagan told me that he would resign the presidency if White House doctors found him mentally unfit. Years later, those doctors and key aides told me they had not detected any changes in his mental abilities while in office.

Now a clever new analysis has found that during his two terms in office, subtle changes in Mr. Reagan’s speaking patterns linked to the onset of dementia were apparent years before doctors diagnosed his Alzheimer’s disease in 1994.

Read the rest of this entry »

Permalink Comments (21)

LEXHUB

March 31, 2015 @ 11:26 am· Filed by Mark Liberman under Announcements, Computational linguistics

From Christie Versagli:

It's with enthusiasm that we at the World Well-Being Project (University of Pennsylvania) would like to share with you the launch of lexhub.org, a hub for data, tools, papers, and almost any resource in the growing field of language analysis for social science.

Read the rest of this entry »

Permalink Comments (1)

Fake account spotting on Facebook

March 1, 2015 @ 1:55 pm· Filed by Geoffrey K. Pullum under Computational linguistics, Endangered languages, Language on the internets, Names, Orthography

One language-related story in the British press over the weekend was that Gavin McGowan was threatened by Facebook with having his account shut down… because they said his name was fake.

About ten years ago Gavin learned some Scottish Gaelic and started using the Gaelic spelling of his name: Gabhan Mac A Ghobhainn. Facebook is apparently running software designed to spot bogus accounts on the basis of the letter-strings used to name them. Gabhan's name evidently failed the test.

Read the rest of this entry »

Permalink Comments (36)

"They called for more structure"

February 22, 2015 @ 11:41 am· Filed by Mark Liberman under Computational linguistics

From Kevin Knight's home page:

I think our approach to syntax in machine translation is best described in D. Barthelme's short story They called for more structure….

Read the rest of this entry »

Permalink Comments (25)

REAPER

February 8, 2015 @ 8:47 am· Filed by Mark Liberman under Computational linguistics, Prosody

A couple of days ago, I mentioned ("Sarah Koenig", 2/5/2015) that David Talkin was releasing a new pitch tracking program called REAPER (available from github at the link). After a few minor improvements in documentation, it's ready for the general public.

The reaper program uses the EpochTracker class to simultaneously estimate the location of voiced-speech "epochs" or glottal closure instants (GCI), voicing state (voiced or unvoiced) and fundamental frequency (F0 or "pitch"). We define the local (instantaneous) F0 as the inverse of the time between successive GCI.

After trying it out, I can recommend it whole-heartedly — it's robust and accurate and fast. It's my new standard pitch tracker.

Read the rest of this entry »

Permalink Comments (5)

Decreasing definiteness

January 8, 2015 @ 6:23 am· Filed by Mark Liberman under Computational linguistics, Language and culture, Linguistic history

During the course of the 20th century, the frequency of the English definite article the decreased gradually and radically. I first noticed this effect about a year ago, in a post about the history of State of the Union addresses ("SOTU evolution", 1/26/2014), where I observed, in reference to the graph on the right, that

The average frequency of the in the most recent 10 SOTU addresses (2004-2013) was 47,458 per million words; in the first 10 addresses (1790-1799, all delivered as speeches to Congress) it was 93,201 per million words, almost double the frequency. And the decline during the 20th-century era of oral addresses seems to have been a gradual one.

I speculated that

Maybe the style of speeches has been getting gradually less formal, and therefore gradually less like written style. Or maybe even formal styles have been changing.

And I noted that a corresponding effect can be seen in two other sources, the BYU Corpus of Historical American English (COHA) and the Google Books N-Gram viewer (GNG), though it is considerably smaller in magnitude:

COHA and the Google Books data pretty much agree, which is reassuring; and they both suggest a slight decline in the frequency of the; but the change that they show is very modest compared to the change in SOTU frequencies. So I feel that the explanation for the SOTU change remains to be found.

At that point, I turned my attention to other aspects of SOTU evolution. But a student paper recently reminded me of this issue.

Read the rest of this entry »

Permalink Comments (41)

The global language network

December 16, 2014 @ 10:16 am· Filed by Mark Liberman under Computational linguistics

Michael Erard has a nice discussion in Science magazine of a paper recently published in PNAS: "Want to influence the world? Map reveals the best languages to speak", 12/15/2014.

The original paper is Shahar Ronen et al., "Links that speak: the global language network and its association with global fame", PNAS 2014. And there's a cute interactive visualization.

Read the rest of this entry »

Permalink Comments (7)

Another dumb Flesch-Kincaid exercise

October 26, 2014 @ 6:26 pm· Filed by Mark Liberman under Computational linguistics, Language and the media

E.J. Fox and Mike Spies, "Who was America's most well-spoken president?", vocativ.com 10/10/2014:

Using the Flesch-Kincaid readability test—the most well-known reading comprehension algorithm—Vocativ analyzed over 600 presidential speeches, going back to George Washington. We measured syllables along with word and sentence counts, and gave each speech a numerical grade. For instance, a grade of four means the content is accessible to a fourth-grader, while a grade of 12 corresponds to that of a high school graduate, a 15 to that of a college graduate and a 21 or higher to that of a PhD. Ultimately, we drew five conclusions, each of which was analyzed by Jeff Shesol, a historian and former speechwriter for Bill Clinton.

Read the rest of this entry »

Permalink Comments (8)

"Voiceprints" again

October 14, 2014 @ 11:24 am· Filed by Mark Liberman under Computational linguistics

"Millions of voiceprints quietly being harvested as latest identification tool", The Guardian (AP), 10/13/2014:

Over the telephone, in jail and online, a new digital bounty is being harvested: the human voice.

Businesses and governments around the world increasingly are turning to voice biometrics, or voiceprints, to pay pensions, collect taxes, track criminals and replace passwords.

The article lists some successful applications:

Barclays plc recently experimented with voiceprinting as an identification for its wealthiest clients. It was so successful that Barclays is rolling it out to the rest of its 12 million retail banking customers.

“The general feeling is that voice biometrics will be the de facto standard in the next two or three years,” said Iain Hanlon, a Barclays executive.

Read the rest of this entry »

Permalink Comments (8)

Archive for Computational linguistics

Annals of LID

Marriage O'Quality

OK Google

Modeling repetitive behavior

Early Alzheimer's signs in Reagan's speech

LEXHUB

Fake account spotting on Facebook

"They called for more structure"

REAPER

Decreasing definiteness

The global language network

Another dumb Flesch-Kincaid exercise

"Voiceprints" again

Follow us on Twitter

Archives [+/–]

Blogroll [+/–]

Meta