Archive for Computational linguistics

A zero-tolerance approach to PP attachment

Deborah Ball, "Pope Francis Appoints Eight to Sex-Abuse Commission", WSJ 3/22/2014:

Pope Francis on Saturday appointed a victim of sexual abuse and a senior cardinal known for his zero-tolerance approach to a new group charged with advising the Catholic Church on how to respond to the problem of sexual abuse of children.

The sequence "zero-tolerance approach to a new group" sent Tim Leonard down a syntactic garden path — he had to get past "charged with advising the Catholic Church" before he figured out that the cardinal was appointed to the new group rather than having a zero-tolerance approach to it. So Tim forwarded the example to me, and I had exactly the same experience.

Read the rest of this entry »

Comments (22)

Erdogan's phone conversations

Recep Tayyip Erdoğan has been the prime minister of Turkey for 11 years. On Monday, someone posted on YouTube what purports to be recordings of a series of phone conversations between Erdoğan and his son, discussing how to hide a billion dollars or so in cash: "Başçalan Erdoğan'ın Yalanlarının ve Yolsuzluklarının Kaydı"= "Recording of Erdogan's lying and corruption". Here's an acted version of an English translation, from "Full transcript of voice recording purportedly of Erdoğan and his son", Today's Zaman 2/26/2014:

Read the rest of this entry »

Comments (10)

Rates of exchange

Comments (18)

Charles J. Fillmore, 1929-2014

Arnold Zwicky shares the sad news that the Berkeley linguist Charles J. "Chuck" Fillmore passed away yesterday. Arnold quotes Amy Dahlstrom's Facebook update:

Charles Fillmore died yesterday at age 84 after a long battle with cancer. A brilliant linguist, especially in the field of lexical semantics, who influenced so many of us Berkeley students and colleagues elsewhere. He was sweet and funny and loving, and deeply devoted to [his wife, Berkeley linguist] Lily Wong Fillmore. The loss of my Doktorvater feels like the loss of a parent.

Read the rest of this entry »

Comments (8)

School grammar, round two

There were many interesting comments on my recent post "Putting grammar back in grammar schools: A modest proposal". I wasn't able to participate in the discussion, due to competition from travel, holiday activities, fall semester grading, conference deadline, a wedding, …, so today I'll take up one or two of the points that were raised.

First, let me say that Dick Hudson has kindly agreed to write a guest post about grammar teaching in the UK, and educational linguistics in general, expanding on his comment. In what follows, I'll make a few observations of my own about the motivations for putting grammar — and linguistic analysis in general — into the school curriculum; about ways and means for moving towards this goal in the U.S.; and about what skills and concepts I had in mind.

Read the rest of this entry »

Comments (38)

Speech rhythm in Visible Speech

In "Speech rhythms and brain rhythms", 12/2/2013, I showed the results of a simple experiment looking for evidence of speech rhythms in the frequency domain, which found a peak at about 2.4 Hz in the average spectrum of the waveform envelope of 6300 read sentences. I don't have anything new to say about what what this means, but I wanted to note a 65-year-old example of a somewhat similar experiment.

Read the rest of this entry »

Comments (2)

Separated by a common problem

The first issue of a new journal has just appeared: Linguistic Evidence in Security, Law and Intelligence (LESLI), founded and edited by Dr. Carole Chaski. As a member of the editorial board, I'm pleased with the quality of the first issue, and I feel that Carole deserves a round of applause.

But there's something in the first issue that reminds me of a long-standing puzzle:  why is there so little communication between two research communities who seem to be working on essentially the same problem? The trigger is Harry Hollien's policy paper, "Barriers to Progress in Speaker Identification with Comments on the Trayvon Martin Case". And the two communities — separated by a common problem — are the people who work on what Prof. Hollien calls (forensic) "speaker identification", versus the people involved with what I know as "speaker recognition research".

Read the rest of this entry »

Comments (3)

The long get longer

Al Filreis's Modern and Contemporary American Poetry is one of the most successful MOOCs. In particular, participants' involvement is sustained over time to an unusual extent — here's the daily volume of forum posts and comments for the first two months of ModPo2, which is currently underway:

Read the rest of this entry »

Comments (14)

Speech rhythms and brain rhythms

[Warning: More than usually geeky...]

During the past decade or two, there's been a growing body of work arguing for a special connection between endogenous brain rhythms and timing patterns in speech. Thus Anne-Lise Giraud & David Poeppel, "Cortical oscillations and speech processing: emerging computational principles and operations", Nature Neuroscience 2012:

Neuronal oscillations are ubiquitous in the brain and may contribute to cognition in several ways: for example, by segregating information and organizing spike timing. Recent data show that delta, theta and gamma oscillations are specifically engaged by the multi-timescale, quasi-rhythmic properties of speech and can track its dynamics. We argue that they are foundational in speech and language processing, 'packaging' incoming information into units of the appropriate temporal granularity. Such stimulus-brain alignment arguably results from auditory and motor tuning throughout the evolution of speech and language and constitutes a natural model system allowing auditory research to make a unique contribution to the issue of how neural oscillatory activity affects human cognition.

Read the rest of this entry »

Comments (14)

Subtle differences

Andrew Gelman, "Separated by a common blah blah blah", SMCISS 12/1/2013:

I love reading the kind of English that English people write. It’s the same language as American but just slightly different. I was thinking about this recently after coming across this footnote from “Yeah Yeah Yeah: The Story of Modern Pop,” by Bob Stanley:

Mantovani’s atmospheric arrangement on ‘Care Mia’, I should add, is something else. Genuinely celestial. If anyone with a degree of subtlety was singing, it would be quite a record.

It’s hard for me to pin down exactly what makes this passage specifically English, but there’s something about it . . .

Read the rest of this entry »

Comments (29)

British headlines: 18% less informative shorter

Chris Hanretty, "British headlines: 18% less informative than their American cousins", 11/29/2013:

I’m currently working on a project looking at the representation of constituency opinion in Parliament. One of our objectives involves examining the distribution of parliamentary attention — whether MPs from constituencies very concerned by immigration talk more about immigration than MPs from constituencies that are more relaxed about the issue.

To do that, I’ve been relying on the excellent datasets made available from the UK Policy Agendas Project. In particular, I’ve been exploring the possibility of using their hand-coded data to engage in automated coding of parliamentary questions.

One of their data-sets features headlines from the Times. Coincidentally, one of the easier-to-use packages in automated coding of texts (RTextTools) features a data-set with headlines from the New York Times. Both data-sets use similar topic codes, although the UK team has dropped a couple of codes.

How well does automated topic coding work on these two sets of newspaper headlines?

Read the rest of this entry »

Comments (15)

More on the statistics of real-estate listings

Early last summer, an inquiry from Sanette Tanaka at the WSJ led me to do a Breakfast Experiment™ on the relationship between the language of real-estate listings and the price of the associated properties ("Long is good, good is bad, nice is worse, and ! is questionable", 6/12/2013; "Significant (?) relationships everywhere", 6/14/2013; "City of the big disjunctions", 6/20/2013).

Read the rest of this entry »

Comments (3)

Speaker-change offsets

In Meg Wilson's post on marmoset vs. human conversational turn-taking,  I learned about Tanya Stivers et al., "Universals and cultural variation in turn-taking in conversation", PNAS 2009, which compared response offsets to polar ("yes-no") questions in 10 languages. Here's their plot of the data for English:

Based on examination of a Dutch corpus, they argue that "the use of question–answer sequences is a reasonable proxy for turn-taking more generally"; and in their cross-language data, they found that "the response timings for each language, although slightly skewed to the right, have a unimodal distribution with a mode offset for each language between 0 and +200 ms, and an overall mode of 0 ms. The medians are also quite uniform, ranging from 0 ms (English, Japanese, Tzeltal, and Yélî-Dnye) to +300 ms (Danish, ‡Ākhoe Hai‖om, Lao) (overall cross-linguistic median +100 ms)."

Read the rest of this entry »

Comments (20)

Non-projective flavor

From a current Starbucks ad, a nice example of a non-projective English sentence:

Read the rest of this entry »

Comments (29)

On Interdisciplinary Collaboration and "Latent Personas"

This is a guest post by David Bamman, in response to the post by Dan Garrette ("Computational linguistics and literary scholarship", 9/12/2013).

The critique by Hannah Alpert-Abrams and Dan Garrette of our recent ACL paper ("Learning Latent Personas of Film Characters") and the ensuing discussion is raising interesting questions on the nature of interdisciplinary research, specifically between computer science and literary studies. Garrette frames our paper as "attempting to … answer questions in literary theory" and Alpert-Abrams argues that for a given work of this kind to be truly interdisciplinary, it "must be cutting edge in the field of literary scholarship too." To do truly meaningful work at the intersection of computer science and literary studies, they argue, parties from both sides need to be involved.

While I disagree with how Garrette and Alpert-Abrams have characterized our paper (as attempting to address literary theory), I fundamentally agree with their underlying point. I have a different understanding of how we get to that point, however; to illustrate this, let me offer here a different framing of our paper.

Read the rest of this entry »

Comments (28)