Language Log

Speed vs. efficiency in speech production and reception

September 11, 2019 @ 9:19 am · Filed by Victor Mair under Computational linguistics, Intelligibility, Speech-acts

An interesting new paper on speech and information rates as determined by neurocognitive capacity appeared a week ago:

Christophe Coupé, Yoon Oh, Dan Dediu, and François Pellegrino, "Different languages, similar encoding efficiency: Comparable information rates across the human communicative niche", Science Advances, 5.9 (2019): eaaw2594. doi: 10.1126/sciadv.aaw2594.

Here's the abstract:

Language is universal, but it has few indisputably universal characteristics, with cross-linguistic variation being the norm. For example, languages differ greatly in the number of syllables they allow, resulting in large variation in the Shannon information per syllable. Nevertheless, all natural languages allow their speakers to efficiently encode and transmit information. We show here, using quantitative methods on a large cross-linguistic corpus of 17 languages, that the coupling between language-level (information per syllable) and speaker-level (speech rate) properties results in languages encoding similar information rates (~39 bits/s) despite wide differences in each property individually: Languages are more similar in information rates than in Shannon information or speech rate. These findings highlight the intimate feedback loops between languages’ structural properties and their speakers’ neurocognition and biology under communicative pressures. Thus, language is the product of a multiscale communicative niche construction process at the intersection of biology, environment, and culture.

Final paragraph:

To conclude from a broad evolutionary perspective, we thus see human language as inhabiting a biocultural niche spanning two scales. At a local scale, each system consisting of a given language and its speakers represents one instantiation of a cultural niche construction process in a specific context involving the ecological, biological, social, and cultural environments. At a global scale, all of these language speakers’ local systems are subjected to universal communicative pressures characterizing the human-specific communication niche and consequently fulfilling universal functions of communication essential for the human species

Here are the languages analyzed:

Austroasiatic [Vietnamese (VIE)], Basque [Basque (EUS)], Indo-European [Catalan (CAT), German (DEU), English (ENG), French (FRA), Italian (ITA), Spanish (SPA), and Serbian (SRP)], Japanese [Japanese (JPN)], Korean [Korean (KOR)], Sino-Tibetan [Mandarin Chinese (CMN) and Yue Chinese/Cantonese (YUE)], Tai-Kadai [Thai (THA)], Turkic [Turkish (TUR)], and Uralic [Finnish (FIN) and Hungarian (HUN)]

The paper itself is highly technical and involves a considerable amount of mathematical computation, so it might be easier for some readers to approach it through this article by Rachel Gutman:

"A Rare Universal Pattern in Human Languages: Some languages are spoken more quickly than others, but the rate of information they get across is the same", The Atlantic (9/4/19)

Gutman's explanation of the relationship between sound and information transmission is helpful, at least for me:

Informativity in linguistics is usually calculated per syllable, and it’s measured in bits, just like computer files. The concept can be rather slippery when you’re talking about talking, but essentially, a bit of linguistic information is the amount of information that reduces uncertainty by half. In other words, if I utter a syllable, and that utterance narrows down the set of things I could be talking about from everything in the world to only half the things in the world, that syllable carries one bit of information.

Of course, there is a great deal of individual variation in the speed with which different people speak the same language, but the overall tempo of a given language is similar across the population of speakers of that particular tongue. Nonetheless, as an example of someone who spoke with phenomenal rapidity, I may cite a village magistrate in Nepal whom I met in 1965-67. It was almost comical how fast he spoke. Every syllable was spoken distinctly, but they were uttered so quickly that they went by in a blur and it was very hard for me to absorb much of what he was saying, although halfway through my two-year stay in Nepal I had become completely fluent in Nepali and was capable of carrying on fully intelligible conversations with most speakers at normal speeds. Instead of trying to understand what the rapid-fire village magistrate was saying, I would just stand there with an amazed, quizzical look and marvel at how his tongue and lips could produce such a machine gun stream of syllables.

Reacting to the above-cited article, June Teufel Dreyer, herself hailing from N’yawk, remarks:

New Yorkers are frequently teased for speaking too quickly, just as Southerners are for speaking so slowly. Could it be that other people’s brains are absorbing only a certain amount of what New Yorkers are saying, since the article implies there are certain limitations on the brain’s processing abilities?

A college friend and fellow New Yorker once opined that in New York you have to talk fast because you know that people there will stop listening after 30 seconds. I thought, “that makes sense.”

Summary wrapup of the Sciences Advances paper:

This is a report of measurements of how much information speakers of various languages typically transmit in a given period. A total of 17 major languages were studied and while it was found that there was wide variation in the speed of talking as measured in syllables per second, the "faster" languages (e.g., Japanese) had lower information content per syllable than the "slower" ones (e.g., Thai) and that as a result all had very similar average rates of information transmission — just about 40 bits per second. The hypothesis is that this represents inherent limits in the underlying language processing in the human brain that is common to all languages.

Personally, whenever I'm talking to someone else, no matter in what language, I occasionally wish that they would either slow down or speed up. In other words, I have a distinct comfort zone for absorbing information from the person with whom I'm conversing. Sometimes I feel like saying to them, "Spit it out, my friend", and sometimes I want to tell them to slow down a bit. On the other hand, I'm sure different auditors feel the same way about information I'm conveying to them. Most of the time, however, a truly satisfying conversation results when interesting information is being exchanged at a relaxed rate.

[h.t. Chiu-kuei Wang]

September 11, 2019 @ 9:19 am · Filed by Victor Mair under Computational linguistics, Intelligibility, Speech-acts

Permalink

20 Comments

Ambarish Sridharanarayanan said,

September 11, 2019 @ 9:52 am

> Austroasiatic [Vietnamese (VIE)], Basque [Basque (EUS)], Indo-European [Catalan (CAT), German (DEU), English (ENG), French (FRA), Italian (ITA), Spanish (SPA), and Serbian (SRP)], Japanese [Japanese (JPN)], Korean [Korean (KOR)], Sino-Tibetan [Mandarin Chinese (CMN) and Yue Chinese/Cantonese (YUE)], Tai-Kadai [Thai (THA)], Turkic [Turkish (TUR)], and Uralic [Finnish (FIN) and Hungarian (HUN)]

How's it reasonable for a "cross-linguistic corpus of 17 languages" to completely exclude the 3rd, 4th, 5th, 6th and 7th most popular language families? And how come their Indo-European language samples are all from Europe, and all but one from Western Europe?
Samuel Buggeln said,

September 11, 2019 @ 10:13 am

This Shannon "information-per-syllable" measure is weirdly relevant in my field of translation for theater. Some translators (misguidedly in my view) aim to replicate the meter of a verse play syllable-for-syllable. I'd be willing to bet based on my theater translation experience, that Spanish is a "faster" language (as defined above in syllables per second) than English. Syllable-faithful Spanish-to-English translations sound baggy and slow to express the point.
Trogluddite said,

September 11, 2019 @ 12:33 pm

> "Could it be that other people's brains are only absorbing a certain amount […] ?"

In common with many other autistic people, I find myself very often wishing that speakers would "slow down a bit", and I'm often prompted to "spit it out" (or have my sentences completed) by impatient listeners. Many autistic people describe finding themselves flying by the seat of their pants in conversation, fingers crossed that their turn to speak will not be upon them before their comprehension has caught up – all the while dreading that they will embarrass themselves yet again by saying something inappropriate because of e.g. having to guess at an unresolved referent (the alternatives – long pauses and asking "dumb questions" – may have their own negative consequences.)

Of course, there may be many other factors to consider besides speech comprehension for people with such conditions, particularly where social comprehension and pragmatics are concerned. However, the perception that typical speech rates overwhelm one's available processing power is a very commonly reported factor contributing to social anxiety among people in the support groups that I use.

Unfortunately, people don't just get irritated by others who speak at a rate that they're not comfortable with. They make value judgements about the speaker, too! – as many autistic people, and no doubt folks with a wide variety of other conditions, are all too often reminded.
jin defang said,

September 11, 2019 @ 2:00 pm

Trogluddite is right—I must admit to being just such a person. Not for persons with a genuine disability, but definitely for people who wander ponderously in the direction of a completed thought. Am thinking "c'mon already, finish!"

A student recently described one of his professors as "lost in a polysyllabic fog." Perfect.
Jerry Packard said,

September 11, 2019 @ 5:01 pm

In an earlier paper (Pellegrino, F. , Coupé, C. and Marsico, E. (2011). Across-Language Perspective on Speech Information Rate. Language 87:3. Pp. 539-558), Pelligrino ranks Mandarin (out of Mandarin, English, German, French, Italian, Spanish and Japanese) #1 on information density and average number of constituents per syllable, and #7 on number of syllables and number of syllables spoken per second.

So if Chinese has a relatively small number of syllables, and if the average number of constituents per syllable is among the highest, and its spoken language information density is higher than the other languages, then it stands to reason that the language would have to be spoken slowly in order for its information content to be adequately processed. That is why Mandarin was found to be the most slowly spoken language among the seven languages analyzed.
Victor Mair said,

September 11, 2019 @ 7:41 pm

Brilliant comment by Jerry Packard. Much appreciated.
yoandri dominguez said,

September 12, 2019 @ 1:29 am

this all seem too convoluted and absurd, sorry. the elephant ij the room–word order and focus, the meaning the meaning; pithiness.
yoandri dominguez said,

September 12, 2019 @ 1:38 am

syllables is always unstable. victor mair know how beijing folks talk fast. or folks think syllables got wrote in stone? utter absurd.
Julian said,

September 12, 2019 @ 3:10 am

@Jerry Packard 'the average number of constituents per syllable'
What is a constituent in a syllable please? I know 'constituent' only in context of phrase structure grammar. Thanks
R. Fenwick said,

September 12, 2019 @ 5:58 am

Ambarish has a point. The SAE-heavy nature of the sample size makes it really difficult to reach to the idea of universality based upon this sample set. I'd be interested to see how, for instance, Navajo stacks up with its famously opaque and semantically-heavy classificatory verbs. Also North-West Caucasian languages, which I've anecdotally seen described as capable of unusually "fast" information transference rate for a number of reasons. I've never gotten the impression from Ubykh texts that their phonetic content is expressed particularly slowly, and yet the semantic content does subjectively seem to be considerably richer per unit of time.
Jerry Friedman said,

September 12, 2019 @ 12:18 pm

Rachel Gutman wrote: "The concept can be rather slippery when you're talking about talking, but essentially, a bit of linguistic information is the amount of information that reduces uncertainty by half. In other words, if I utter a syllable, and that utterance narrows down the set of things I could be talking about from everything in the world to only half the things in the world, that syllable carries one bit of information."

Does that let us estimate the number of things in the world, allowing for some slipperiness?
Jerry Packard said,

September 12, 2019 @ 2:12 pm

@Julian – In a syllable, a constituent is a phoneme.
Jerry Packard said,

September 12, 2019 @ 2:24 pm

@Samuel – According to previously-cited article: syllables/second
Spanish = 7.82, English = 6.19, Mandarin = 5.18
Charlotte Stinson said,

September 12, 2019 @ 2:38 pm

@Jerry Packard: does the dimension of tone count as a constituent for purposes of measuring information density in Mandarin and other languages that use tone?
Jerry Packard said,

September 12, 2019 @ 6:22 pm

In the most recent study, yes. In the earlier study, no (at least for Mandarin).
cliff arroyo said,

September 13, 2019 @ 5:53 am

" In a syllable, a constituent is a phoneme"

Given cross linguistic variance in average syllable weight (and given that 'syllable' is not necessarily always the optimal timing unit in every langauge) might the rate of phonemes be a better unit of measurement?
Terry Hunt said,

September 13, 2019 @ 7:50 am

As I (dimly and possibly mistakenly) understand it, the methodology assumes that a given syllable in a given language carries a particular quantity of information, measured in bits.

Does this assumed bit-value rely solely on the literal meaning of the syllable (in the context of surrounding syllables, presumably) or does it also take into account things like intonation and emphasis?

As basic examples, the intonation of closing syllables of a sentence in most traditional varieties of English often indicate whether or not the sentence is a question, and other intonation patterns lend structure, hence meaning, at a sentence and paragraph level. Emphasis on certain words can convey additional or different meanings ("This you call a bagel?" "Thisyou you call a bagel?" "This you call a bagel?").

In the case of the included tonal languages, were the "same" syllables spoken with n different tones counted as n different syllables? (I would hope so.)

Were additional parallel information channels such as facial expressions and gestures included? (I'd guess not.)

If all of the above (and other factors I haven't thought of but others might) were taken into account, I wonder if and how it would affect the results.
Terry Hunt said,

September 13, 2019 @ 7:58 am

I see now that Charlotte Stinson has already queried, and Jerry Packard has already responded to, the point about tones. apologies for the redundancy.

Which suggests another point. I believe that different languages have different amounts of redundancy, which can change over time (such as Modern colloquial French dropping the "ne" of ". . . ne . . . pas . . ." constructions. I'd naively predict that languages with higher redundancies would be spoken more quickly because the redundancies would compensate for mis-hearings.
Julian said,

September 13, 2019 @ 7:36 pm

This is is fairly intuitive, isn’t it? There is no language pair in which an interpreter’s turn takes twice as long as the interpretee’s turn. There is no community where everyone talks as fast as your Nepalese magistrate, and those in the community can understand but outsiders can’t.
Wouldn’t we expect that that natural selection has shaped our mouth parts, language processor and languages in tandem to find the sweet spot between speakers’ desire to minimise effort and listeners’ need to understand? Wouldn’t we expect that the forces involved would be similar for all humans?
For our ancestors on the savannah, natural selection would tend to select against people whose language took a very long time to say ‘Watch out for that rhinocer—‘. At the other extreme, it would select against people whose language was so dense that listeners fell behind in comprehension.
That implies that there’s a selection related constraint on the efficiency of our language processor. Why is it just so, not quicker or slower? Why don’t we all talk like your Nepalese magistrate? Why can’t we read the whole Library of Congress in an hour, like Fred Hoyle’s black cloud?
David Marjanović said,

September 14, 2019 @ 2:50 pm

Follow the link to the paper, it is in open access.

As I (dimly and possibly mistakenly) understand it, the methodology assumes that a given syllable in a given language carries a particular quantity of information, measured in bits.

Well, it's Shannon information. That's "information" in a mathematical sense that has nothing to do with meaning, just with the size of phoneme inventory (tonemes included) from which a syllable can be constructed, and the average number of phonemes per syllable in the same language.

The result is that languages with more complex syllables or more phonemes to choose from are spoken more slowly, i.e. with fewer syllables per time, than those with simpler syllables and smaller phoneme inventories.

RSS feed for comments on this post

Speed vs. efficiency in speech production and reception

20 Comments

Ambarish Sridharanarayanan said,

Samuel Buggeln said,

Trogluddite said,

jin defang said,

Jerry Packard said,

Victor Mair said,

yoandri dominguez said,

yoandri dominguez said,

Julian said,

R. Fenwick said,

Jerry Friedman said,

Jerry Packard said,

Jerry Packard said,

Charlotte Stinson said,

Jerry Packard said,

cliff arroyo said,

Terry Hunt said,

Terry Hunt said,

Julian said,

David Marjanović said,

Follow us on Twitter

Archives [+/–]

Blogroll [+/–]

Meta