Conversational rhythms

« previous post | next post »

A few weeks ago, I posted on some work-in-progress on speaking rate ("How fast do people talk in court?", 3/21/2009).  Since then, I've added coverage in the same style of a few thousand telephone conversations from the Switchboard Corpus.

The main motivation of this work (done jointly with Jiahong Yuan and Linda Drake) is a simple and practical one: to establish a better-grounded set of expectations about the distribution of speaking rates in various sorts of material. Beyond that, it's obvious that the the ebb and flow of conversational interaction is visible to some extent in a graphical presentation of who said how much when, entirely independent of the content.  Here's a graph of the  local speaking rate on the A and B sides of a two-person conversation, calculated in a moving 30-second window that's stepped along five seconds at a time:

(Click on the image for a larger version.)

Here's the same conversation, with the rates calculated in a five-second window:

Using automated alignment, this sort of thing is easy to calculate for any transcribed conversational recording where the participants are recorded on separate channels. And I believe (though I haven't demonstrated) that speech-recognition techniques could be used to get essentially the same data from untranscribed recordings.

Just for grins, here are the summed rates for the two sides taken together:

As this page explains, the Switchboard corpus — originally collected in 2001 for a speaker-recognition study — has now been annotated in almost 20 different linguistically-relevant ways, and thus "offers one of the richest resources available for the study of discourse in spontaneous speech".



4 Comments

  1. James C. said,

    April 13, 2009 @ 4:14 pm

    Any plans for a crosslinguistic study? I don’t know whether appropriate corpora are available, but I’d certainly like to see comparisons to Mandarin Chinese, Arabic, etc. Or even just within the IE family, since it seems (intuitively, which may be wrong) that speech rate is much more variable than phonology and syntax.

    [(myl) A few years ago, I discussed a comparison of overall speaking rates across U.S. regions ("Regional speech rates", 10/13/2007); see also "The tale of the tape: Those fast-talking southerners", 10/17/2007. For a discussion of sex differences, see "Sex and speaking rate", 8/7/2006; "Sex doesn't matter", 11/11/2005. Jiahong Yuan, Chris Cieri and I presented a paper at ICSLP 2006 entitled "Towards an Integrated Understanding of Speaking Rate in Conversation", which presented some comparisons between English and Mandarin Chinese.

    A rather data-poor comparison between English and Hungarian can be found in "Hungarian speech rate and the tribunal of revolutionary empirical justice", 8/16/2006, and "Those fast-talking Hungarians marketing researchers", 8/16/2006.

    There's some literature on cross-language speaking-rate comparisons, which I'll review here at some point in the future, but there isn't really a lot of it. ]

    It’s too bad there aren’t (to my knowledge) good corpora for native North American languages. People constantly remark on the much slower speech rate in these languages, and it’d be interesting to see a comparison of spontaneous naturalistic speech rates in conversation. These languages would raise an important issue, however, in that your tokens are simple words for English, but for highly polysynthetic languages this would be inappropriate as a crosslinguistic measure (at least without some sort of statistical scaling).

    [(myl) Even within one language, you can get very different numbers depending on your definition of "word", your ways of counting silences and overlapping speech, what you do with filled pauses and false starts, etc. Counting syllables is somewhat less problematic, within as well as across languages — but of course the per-syllable entropy in different languages can be quite different — either as a matter of phonological structure, lexical distribution, or textual redundancy — so the same speaking rate in syllables per minute might correspond to quite different rates of information transfer.

    All that said, I've often heard the same sort of reports about conversational rhythms in American Indian languages/cultures, and some of the reports have come from experienced people whose judgment I trust. However, I don't know of any empirical studies. For an even more culturally loaded point of comparison, see "The rhetoric of silence", 10/3/2004. ]

  2. Bill Walderman said,

    April 14, 2009 @ 7:56 am

    Following up on James C.'s comment, has anyone attempted to study speech rates of languages that are more highly inflected than English (Russian, for example) and to contrast those languages with English, which is more dependent on word order for conveying syntactic information? I wonder whether that the need to enunciate inflectional morphemes clearly would result in slower speech rates in similar contexts for highly inflected languages than for English.

  3. Aaron Davies said,

    April 14, 2009 @ 10:03 am

    interesting, i was just thinking earlier about information density in language. i wonder how hard it would be to measure the information directly (e.g. shannon entropy) and look at whether that rate varies.

    [(myl) There's some discussion of related issues in a couple of older LL posts: "Speech rate and per-syllable information across languages", 4/12/2008; "Comparing communication efficiency across languages", 4/4/2008; "Mailbag: Comparative communication efficiency", 4/5/2008; and "One world, how many bytes?", 8/5/2005]

  4. Mark Liberman said,

    April 14, 2009 @ 11:44 am

    Bill Walderman: Following up on James C.'s comment, has anyone attempted to study speech rates of languages that are more highly inflected than English (Russian, for example) and to contrast those languages with English, which is more dependent on word order for conveying syntactic information?

    I'm not aware of any comparison of that sort. It wouldn't be trivial to do, since (a) you'd need at least a half a dozen languages of each type in order to have any confidence that degree of inflectional morphology was the relevant independent variable; and (b) in each language, speaking rates vary widely by individual, speech situation, psychological state, and so on. If we had big enough published speech corpora for enough languages, it would be a straightforward (though large) job; but at the moment, we don't.

    Bill Walderman: I wonder whether that the need to enunciate inflectional morphemes clearly would result in slower speech rates in similar contexts for highly inflected languages than for English.

    Is there any reason to think that inflectional morphemes are enunciated especially clearly? If I had to guess, it would be in the opposite direction, though I don't have any confidence in the answer. In any case, English has more "function words" than (say) Latin or Russian does. So if the density of functional morphemes is a critical variable, you might find that English is more similar to Russian than either is to Chinese.

RSS feed for comments on this post