Speech rate and per-syllable information across languages

« previous post | next post »

Last week, back in the Paleo-Language-Log Era, I ended a post on "Comparing communication efficiency across languages" with this teaser:

A topic for another time: how do typical speech rates differ between languages? Do these interact with per-syllable measures of information content so as to equalize the average rate of information transmission?

Since I've done nothing to follow up on this note, Max Bane sent in a .pdf of some relevant PowerPoint slides.

I just thought I'd mention some very relevant work that was presented last year at the LSA by François Pellegrino.

The important slides are numbers 23 and 24. Pellegrino and his colleagues looked at speech data from seven languages, and found significant negative correlations between the average Shannon entropy of a language's syllable inventory (a reasonable per-syllable measure of information content) and (1) the average rate of speech in syllables per second, and (2) the average number of syllables per word.

Thus it seems there might indeed be some interaction or trade-off going on, as you speculate. Of course, correlations like these do not necessarily mean that the average rate of information transmission is being conserved, but they are consistent with that hypothesis.

François Pellegrino is the director of the Dynamique du Langage laboratory (CNRS – Université Lumière Lyon 2), and the cited presentation is Pellegrino, F., Coupé, C., Marsico, E., 2007, "An Information Theory-Based Approach to the Balance of Complexity between Phonetics, Phonology and Morphosyntax", 81st Annual Meeting of the Linguistic Society of America, Anaheim, CA, USA, 4-7 January 2007.

In that presentation, the pictured relationship between average number of syllables per word (ASW) and syllabic entropy is fairly convincing:

The languages are French (FR), English (EN), Mandarin (MA), German (GE), Spanish (SP), Italian (IT), and Japanese (JA).

The relationship between syllabic rate and syllabic entropy is less striking:

There's a long tradition of work on this and related topics, e.g. Hans Karlgren, "Speech Rate and Information Theory", pp. 671-677, Proceedings of the fourth international congress of phonetic sciences, Helsinki, 1961. The research by Pellegrino et al. looks to be especially careful and detailed; unfortunately I couldn't find a paper version, though the slides are pretty readable.

[With respect to my original post about communicative efficiency, Bob Carpenter at the LingPipe Blog has some interesting comments: "The Entropy of English vs. Chinese", 4/11/2008.]

[Update — Rob Malouf writes:

Interesting graphs. I noticed, though, that most of the languages in the sample are typologically similar in that they have relatively little morphology. Japanese is the exception, and without it the trends wouldn't be nearly as dramatic. To bad there's no big Hawaiian speech corpus…


Comments are closed.