Language Log

Archive for Computational linguistics

Random letter-partition advantages in baby names

May 17, 2014 @ 10:34 am· Filed by Mark Liberman under Computational linguistics, Psychology of language

Commenting on "QWERTY again", 5/14/2014, Rubrick suggested that

It seems like an extremely simple way to check the validity of this theory would be to repeat the analysis, but with the letters grouped into two random subsets, rather than right-left subsets. In fact, I'd think the original authors should have done this as a control. If this new grouping yields a graph with any meaningful-looking trends whatsoever (or if multiple repeats of the analysis with different random subsets yield such trends a significant percentage of the time), it would pretty soundly deflate the idea that the original trends are the result of "right-hand favoritism".

Steve Kass followed up on this suggestion, providing five examples, and commenting that

The graphs don't all look the same, but they all look interesting, and several of them practically beckon the storyteller. There's something interesting about this general kind of data and "advantage function" analysis worth discovering, I think.

Read the rest of this entry »

Permalink Comments (22)

QWERTY again

May 14, 2014 @ 7:22 am· Filed by Mark Liberman under Computational linguistics, Psychology of language

Various readers have pointed out to to me that the "QWERTY Effect" is back. (For coverage of the first QWERTY-Effect paper, see "The QWERTY Effect", 3/8/2012; "QWERTY: Failure to Replicate", 3/13/2012; "Casasanto and Jasmin on the QWERTY effect", 3/17/2012; and "Response to Jasmin and Casasanto's response to me", 3/17/2012.)

The new paper is Casasanto, D., Jasmin, K., Brookshire, G. & Gijssels, T. "The QWERTY Effect: How typing shapes word meanings and baby names". In P. Bello, M. Guarini, M. McShane, & B. Scassellati (Eds.), Proceedings of the 36th Annual Conference of the Cognitive Science Society. Austin, TX: Cognitive Science Society, 2014.

As before, the idea is that typing letters with the right hand makes us like them more; or in the words of their abstract,

Filtering words through our fingers as we type appears to be changing their meanings. On average, words typed with more letters from the right side of the QWERTY keyboard are more positive in meaning than words typed with more letters from the left: This is the QWERTY effect (Jasmin & Casasanto, 2012), which was shown previously across three languages. In five experiments, here we replicate the QWERTY effect in a large corpus of English words, extend it to two new languages (Portuguese and German), and show that the effect is mediated by space-valence associations encoded at the level of individual letters. Finally, we show that QWERTY appears to be influencing the names American parents give their children. Together, these experiments demonstrate the generality of the QWERTY effect, and inform our theories of how people’s bodily interactions with a cultural artifact can change the way they use language.

The most interesting new result is the baby-names experiment, in my opinion; and since I'm stuck in Heathrow Airport for a while, I thought I'd take a quick look at it.

Read the rest of this entry »

Permalink Comments (53)

Draft words

May 11, 2014 @ 12:58 am· Filed by Mark Liberman under Computational linguistics, Language and culture

Reuben Fischer-Baum, Aaron Gordon, and Billy Haisley, "Which Words Are Used To Describe White And Black NFL Prospects?", Deadspin 5/8/2014

Do NFL scouts talk about white players and black players differently? Are certain words reserved for white players? Are others used primarily to describe black players?

Let's try and find out. We've pulled the text from pre-draft scouting reports from NFL.com (written by the infamous Nolan Nawrocki), CBS, and ESPN, split them by player race, counted the number of times individual words appeared using the Voyant tool, and then calculated the rate at which each word appeared per 10,000 words. (In total we pulled 68,465 words on 99 white players—6,228 unique—and 223,868 words on 288 black players—10,580 unique). You can play with the data in the interactive below; simply plug a single word into the input field, hit search, and see how often the word appeared in black and white scouting reports.

Read the rest of this entry »

Permalink Comments (21)

Accessibility and diarization

May 6, 2014 @ 2:38 pm· Filed by Mark Liberman under Computational linguistics

I spent this morning at at ICASSP-2014 session on "Speaker Diarization". As the picture indicates, the room was not exactly handicapped accessible…

Luckily this is not a problem for me, but my experience of three torn knee ligaments a few years ago sticks with me.

Anyhow, I made it up the stairway to Room Scherma, and learned some useful and interesting things about current techniques for speaker diarization, which is the problem of determining who spoke when in an arbitrary audio or video recording.

Read the rest of this entry »

Permalink Comments (4)

Ten years ago in LLOG

March 28, 2014 @ 7:01 am· Filed by Mark Liberman under Computational linguistics

From 3/28/2004, a post that asks a question for which I still don't have a good answer:

How many times does a word or phrase need to be repeated in order to seem characteristic of a speaker or author? I think that the answer is "not very many times, maybe only once or twice, if the use in context is salient enough".

Ruminations on related issues can be found in "Strange Bookfellows" and "Captain Crunch among the Literati". And since this question tells us as much about the reader or listener as it does about the writer or speaker, we should also consider the curious case of the president's pronouns.

Permalink Comments (4)

A zero-tolerance approach to PP attachment

March 26, 2014 @ 12:11 pm· Filed by Mark Liberman under Computational linguistics, Psychology of language

Deborah Ball, "Pope Francis Appoints Eight to Sex-Abuse Commission", WSJ 3/22/2014:

Pope Francis on Saturday appointed a victim of sexual abuse and a senior cardinal known for his zero-tolerance approach to a new group charged with advising the Catholic Church on how to respond to the problem of sexual abuse of children.

The sequence "zero-tolerance approach to a new group" sent Tim Leonard down a syntactic garden path — he had to get past "charged with advising the Catholic Church" before he figured out that the cardinal was appointed to the new group rather than having a zero-tolerance approach to it. So Tim forwarded the example to me, and I had exactly the same experience.

Read the rest of this entry »

Permalink Comments (22)

Erdogan's phone conversations

February 27, 2014 @ 12:24 pm· Filed by Mark Liberman under Computational linguistics, Language and politics

Recep Tayyip Erdoğan has been the prime minister of Turkey for 11 years. On Monday, someone posted on YouTube what purports to be recordings of a series of phone conversations between Erdoğan and his son, discussing how to hide a billion dollars or so in cash: "Başçalan Erdoğan'ın Yalanlarının ve Yolsuzluklarının Kaydı"= "Recording of Erdogan's lying and corruption". Here's an acted version of an English translation, from "Full transcript of voice recording purportedly of Erdoğan and his son", Today's Zaman 2/26/2014:

Read the rest of this entry »

Permalink Comments (10)

Rates of exchange

February 19, 2014 @ 5:46 am· Filed by Mark Liberman under Computational linguistics

Matthew Kehrt writes:

In the continuing saga of "Austria == Ireland" (see also "Made in USA == Made in Austria|France|Italy…"), apparently rubles are Australian dollars:

Read the rest of this entry »

Permalink Comments (18)

Charles J. Fillmore, 1929-2014

February 14, 2014 @ 12:25 pm· Filed by Ben Zimmer under Computational linguistics, Obituaries, Semantics

Arnold Zwicky shares the sad news that the Berkeley linguist Charles J. "Chuck" Fillmore passed away yesterday. Arnold quotes Amy Dahlstrom's Facebook update:

Charles Fillmore died yesterday at age 84 after a long battle with cancer. A brilliant linguist, especially in the field of lexical semantics, who influenced so many of us Berkeley students and colleagues elsewhere. He was sweet and funny and loving, and deeply devoted to [his wife, Berkeley linguist] Lily Wong Fillmore. The loss of my Doktorvater feels like the loss of a parent.

Read the rest of this entry »

Permalink Comments (8)

School grammar, round two

December 30, 2013 @ 8:21 am· Filed by Mark Liberman under Computational linguistics

There were many interesting comments on my recent post "Putting grammar back in grammar schools: A modest proposal". I wasn't able to participate in the discussion, due to competition from travel, holiday activities, fall semester grading, conference deadline, a wedding, …, so today I'll take up one or two of the points that were raised.

First, let me say that Dick Hudson has kindly agreed to write a guest post about grammar teaching in the UK, and educational linguistics in general, expanding on his comment. In what follows, I'll make a few observations of my own about the motivations for putting grammar — and linguistic analysis in general — into the school curriculum; about ways and means for moving towards this goal in the U.S.; and about what skills and concepts I had in mind.

Read the rest of this entry »

Permalink Comments (38)

Speech rhythm in Visible Speech

December 18, 2013 @ 9:28 am· Filed by Mark Liberman under Computational linguistics

In "Speech rhythms and brain rhythms", 12/2/2013, I showed the results of a simple experiment looking for evidence of speech rhythms in the frequency domain, which found a peak at about 2.4 Hz in the average spectrum of the waveform envelope of 6300 read sentences. I don't have anything new to say about what what this means, but I wanted to note a 65-year-old example of a somewhat similar experiment.

Read the rest of this entry »

Permalink Comments (2)

Separated by a common problem

December 12, 2013 @ 11:08 am· Filed by Mark Liberman under Computational linguistics

The first issue of a new journal has just appeared: Linguistic Evidence in Security, Law and Intelligence (LESLI), founded and edited by Dr. Carole Chaski. As a member of the editorial board, I'm pleased with the quality of the first issue, and I feel that Carole deserves a round of applause.

But there's something in the first issue that reminds me of a long-standing puzzle: why is there so little communication between two research communities who seem to be working on essentially the same problem? The trigger is Harry Hollien's policy paper, "Barriers to Progress in Speaker Identification with Comments on the Trayvon Martin Case". And the two communities — separated by a common problem — are the people who work on what Prof. Hollien calls (forensic) "speaker identification", versus the people involved with what I know as "speaker recognition research".

Read the rest of this entry »

Permalink Comments (3)

The long get longer

December 4, 2013 @ 8:26 am· Filed by Mark Liberman under Computational linguistics

Al Filreis's Modern and Contemporary American Poetry is one of the most successful MOOCs. In particular, participants' involvement is sustained over time to an unusual extent — here's the daily volume of forum posts and comments for the first two months of ModPo2, which is currently underway: