Language Log

Fast talking

April 9, 2024 @ 6:05 am · Filed by Victor Mair under Language and mathematics, Prosody, Speech-acts

The topic of this post is one that deeply fascinates me personally, but also has a bearing on many of the main concerns of the denizens of Language Log: information, efficiency, density, complexity, meaning, pronunciation, prosody, speed, gender….

It was prompted by this new article:

What’s the Fastest Language in the World?
Theansweriscomplicated. [sic]
by Dan Nosowitz, Atlas Obscura (April 2, 2024)

The article is based upon the work of François Pellegrino a senior researcher in linguistics and cognitive science at the Centre National de la Recherche Scientifique (CNRS), Paris and Université Lumière Lyon II, France.

Francois Pellegrino is mostly a quantitative linguist, meaning his work often includes measuring differences among languages and hunting for explanations behind those differences. He’s worked on language speed a few times, including on one study that compared 17 different languages in a variety of metrics.

Pellegrino prefers looking at syllables rather than individual sounds (phonemes, to linguists) or words. “So the sounds per syllable, you have two ways to look at it,” he says. “You have one way, which is to look at how fast they are produced and what kind of information they convey. But you can also basically ask people to listen to unknown languages and ask them whether it sounds fast or not. So you have the perceptual aspects, and the articulatory and production aspects.”

There are a whole bunch of metrics used to measure all this stuff. There’s the total number of syllables per unit time, which you might think would be fairly simple to measure but is not; Pellegrino’s team decided to rely on the “canonical” pronunciation, so the word “probably” would be noted as three syllables even if the speaker pronounces it “probly.”

Then there’s “information density,” which theoretically refers to the quantity of information conveyed per second. This is even tougher; it turns out to be an absolute nightmare to actually define. There’s a technical meaning devised by a guy named Claude Shannon that involves, basically, how quickly a listener can reduce their uncertainty about the message they’re getting. This involves calculations of the number of possible syllables in a language, the relative popularity of each of those syllables, and the probability that a certain syllable will follow another. All the Shannon stuff is kind of abstract and involves a lot of math that, frankly, made my head hurt.

Perhaps some of the specialists on information theory who are reading this can explain the "Shannon stuff" we need to know in slow, simple terms.

Linguists like Pellegrino have found that there’s an inverse correlation between, basically, how many syllables you can cram into one second and how much information you can cram into one second. Japanese, for example, has an extremely high number of syllables spoken per second. But Japanese also has an extremely low degree of complexity in its syllables, and much less information encoded per syllable. So the syllables come out at a faster rate, but you need more of them to convey the same amount of information as a slow language, like, say, Vietnamese.

But you can also argue that a language like Vietnamese, or even English, is wildly more efficient than Japanese. Japanese syllables contain, mostly, a consonant followed by a vowel, like ko, and Japanese also only has five vowels. English, though we have five letters to represent vowels, has around 20 different vowel sounds. Just by using “A” in different places we can get the vowel sounds in “cat,” “can,” “cane,” “calm,” and a bunch more. Single syllables in English can be extremely complex: the word “strength” involves big annoying clusters of consonants. Vietnamese goes a step further, adding tones, so the tune or pitch of a syllable can also carry value. (Japanese has a system of emphasizing some syllables but is not generally regarded as a tonal language.) Generally speaking, the more complexity we can cram into a syllable, the more information it carries. [VHM: emphasis added] Japanese is faster than English—around 12 syllables per second, maybe even a couple more for an especially fast speaker—but if English can convey the same information in five syllables, is Japanese really “faster”?

The concept of how much “information” is disclosed in a certain syllable is pretty wooly, too. Languages are messy, inconsistent, and redundant. A direct translation of the English sentence “I am” to Spanish would be “Yo soy.” But the “yo” isn’t necessary and, in fact, would usually be omitted.

In Hebrew, there’s no verb for “to be,” so to express that you’re hungry, you would say “אני רעב” meaning “I hungry.” That Hebrew one is a good example, because the word for “hungry” actually has a gender involved; a woman would say “אני רעבה”, which adds an extra syllable, but also extra meaning. For a man, the English and Hebrew have the same number of syllables, but to actually convey all the information in the Hebrew, the English would have to be more like “I, a man, am hungry,” which is much longer.

The amount of information can sometimes get even more dense. In the Paamese language, spoken on an island in Vanuatu, possessives can include information on the relationship between the speaker and the object. “My coconut” is not simply “my coconut.” The word for “my” could mean “my coconut, which I intend to eat,” or “my coconut, which I grew,” or “my coconut, which I intend to use in my household in some way other than eating or drinking.” This is a dramatically more efficient use of space than the English version! Is it, therefore, in some sense, “faster”?

Even in English, we can contract “I am” to “I’m,” though many contractions don’t actually save syllables (“shouldn’t” and “should not” are both two syllables). It is, in all languages, possible to delete quite a lot of syllables and still be able to convey information. Languages tend to be encoded with a lot of redundancy, but that does serve a purpose. Redundancy allows for understanding even if the listener isn’t used to the speaker’s accent, or can’t hear the speaker perfectly, or isn’t paying attention. If you edit a sentence down to the absolute bare minimum, it would take a pretty fair amount of concentration, and the right circumstances, to understand and maybe even make some educated guesses as to what the speaker is trying to convey.

…

Back to "Shannon's stuff":

Pellegrino, along with a few other researchers, released a paper in 2019 that received a great deal of attention among the admittedly small cohort of people who understand Claude Shannon’s math. The paper found that, in terms of sheer number of syllables spoken per second, the fastest languages of the 17 studied were Japanese, Spanish, and Basque. The slowest were Cantonese, Vietnamese, and Thai.

But! Just to offer a couple of explanations: All three of the fastest languages have only five vowels. The three slowest have upwards of 20, and all are tonal, meaning that there is a gigantic number of possible syllables in those languages.

What Pellegrino found is that, essentially, all languages convey information at roughly the same speed when all the factors are taken into account: around 39 bits per second. The higher the syllable-per-second rate, the lower the information density, which creates a trade-off that makes all languages around the same in terms of information rate.

…

Now on to aspects of language that border on poetry and music:

Another element that might provide some extra data is in what linguists call “prosody,” the intonation and rhythm of speech. Do we include pauses in our analyses? (Pellegrino did not; pauses don’t apply to the specific kind of speed he was looking at.) What about rhythm? Some languages, like, interestingly, Japanese and Spanish, fall closer on the spectrum to having each syllable take up the same length of time. But Japanese also has some pretty elaborate ways to fill uncertain space.

By way of conclusion:

There are so many elements of language that it is impossible for a single metric like “speed” to cover all of these aspects. It’s kind of like asking “Which country is best?” The answer will change depending on all kinds of variables not specified in the question. That’s not to say there’s not some value in attempting to answer it, though.

Anyway, the fastest language is Japanese.

But that doesn't mean it is the most efficient (in conveying a specified amount of information in a given amount of time).

Now, let's imagine Japanese spoken with a southern drawl:

Watashi no namae wa Mifune Toshio, Tōkyō shusshin desu.

私の名前は三船敏夫、東京出身です。

Waa-taa-shii noo naa-maae waa Mii-fuu-nee Too-shii-oo, Tōō-kyōō shuu-sshiin dee-suu.

"My name is Toshio Mifune and I'm from Tokyo."

From the time I began to read Japanese texts (all sorts: newspaper and journal articles, essays, stories, etc.), I was astonished by how many particles, verbal forms, and other linguistic devices the language had for expressing reservation, reconsideration, uncertainty, indecisiveness, hedging one's bets, waffling, probability, surmise, and so on and so forth. Sometimes I would read a sentence or paragraph that was filled with such hypothetical, suppositional, speculative expressions and, having reached the end, say to himself, "What did the author say? What did he mean?" Then I'd read it again and still couldn't decide for sure what he was trying to declare or affirm. For me, indeterminateness came to constitute the genius of Japanese language. Simple declarative sentences are not their forte.

The fastest sustained speech I ever heard was that of a village headman in west central Bhojpur District of Nepal. Whether he was speaking Nepali or Rai, the words spewed from his mouth like bullets from a machine gun. I always marvelled at how people could possibly comprehend him. I understood about 80-90% of Nepali speech at normal speed, but listening to the village headman, I could only catch about half or less of what he was saying. Incidentally, he was the only person for many miles around who had a horse.

Selected readings

"Speed vs. efficiency in speech production and reception" (9/11/19)
"How fast do people talk in court?" (3/21/09)

[h.t. Alan Kennedy]

April 9, 2024 @ 6:05 am · Filed by Victor Mair under Language and mathematics, Prosody, Speech-acts

Permalink

52 Comments

Chris Button said,

April 9, 2024 @ 6:37 am

I've always felt Portuguese and Spanish made a good comparison in terms of perceived, albeit not necessarily always real, differences in tempo.

The full vowels in Spanish syllables create a more staccato effect across syllables than the more reduced ones in Portuguese. Then in Brazilian Portuguese at least, you get palatalization of coronals before high front vowels, which creates a less abrupt sounding onset. As a result, even just a single word like "diferente" appears on the surface to have a slower, more relaxed delivery than in Spanish.
chris Button said,

April 9, 2024 @ 6:42 am

The "i" and "e" in "diferente" both being high front (unlike in Spanish and despite the spelling) to palatalize the "d" and the "t" with the final vowel almost ending up voiceless–similar to what happens with final "u' in Japanese for example.
Colin Watson said,

April 9, 2024 @ 6:48 am

I thought the remark about Hebrew here contained a bit of a howler. Hebrew does have a verb for "to be": לִהְיוֹת "lihyot" in the infinitive. It's just omitted or replaced with a pronoun in present-tense copular forms.
Chris Button said,

April 9, 2024 @ 7:29 am

* sorry just the syllable final "e" being high.

Also, the terms syllable-timed vs stress-timed are often thrown around to account for perceived tempo differences, but the difference is more on a continuum rather than binary, so they're not the best terns and cam be quite misleading.
Cervantes said,

April 9, 2024 @ 7:55 am

Maybe this is a quibble, but just on brief reflection it seems to me Spanish has at least eight vowels. E can be pronounced in two ways — e.g. era and angel — as can u — tu and cuatro — and o — cuatro and por. If I thought about it harder I might find more.
J.M.G.N. said,

April 9, 2024 @ 10:52 am

@Cervantes

Care to elaborate a bit further? Any references?
J.M.G.N said,

April 9, 2024 @ 10:57 am

@Chris

Regarding your staccato approach, check phonetic counterevidence:

https://books.google.es/books/publisher/content?id=pkeTAgAAQBAJ&hl=es&pg=PA106&img=1&zoom=3&bul=1&sig=ACfU3U0wAlymnZYLzEjb2Tt6eNvpydJUzA&w=1280
David L said,

April 9, 2024 @ 11:04 am

I would speculate — based on no theorizing at all, just a general sense of How Things Work — that our aural systems and brains have a maximum rate of information-processing, beyond which we would lose track of what's being said. In that case, different languages may indeed convey information in strikingly different ways, but the rate could not be significantly faster in one language versus another.

Of course, there are also problems with low rates of information transfer.
Cervantes said,

April 9, 2024 @ 11:16 am

I don't have any references offhand, but any Spanish speaker knows how to pronounce those words. Specifically, for example, "o" in credo and creo is like a long o in English; o in por is like aw, or really like o in pore in English. E in credo and creo is like a long a in English, e in angel is like a short e.
anhweol said,

April 9, 2024 @ 11:25 am

For the number of Spanish vowels; there are of course variations, but in most versions of Spanish one can posit 5 basic vowels with allophones determined by position. In a few regions, where a final -s has weakened to complete non-existence but left a more open vowel behind where it used to be, one can perhaps really posit more than 5 phonemic vowels..
Chris Button said,

April 9, 2024 @ 11:40 am

@ J. M. G. N.

The link doesn't work, but I assume it ties into my earlier point about syllable-timed vs stress-timed being a rather artificial binary distinction.
Cervantes said,

April 9, 2024 @ 12:06 pm

Yes, there are five characters that represent vowels in Spanish, but as you say they can be pronounced differently depending on position ( ñ arguably contains a vowel sound although it needs to be followed by another vowel, an ll has a vowel-like pronunciation in many topolects). There are only 5 characters that represent vowels in English (well, not really, R can serve as a vowel, and when it's preceded by e, u or i they're normally silent), but many more vowel pronunciations. That's my point.
Jerry Packard said,

April 9, 2024 @ 1:26 pm

Several years ago Dr. Pellegrino was kind enough to send me the calculated speech rates (syl/second) in the 2019 article for the languages in question.

Lang SR
CAT 7.07
CMN 5.86
DEU 6.09
ENG 6.34
EUS 7.54
FIN 7.17
FRA 6.88
HUN 5.87
ITA 7.16
JPN 8.03
KOR 7.12
SPA 7.73
SRP 7.15
THA 4.70
TUR 7.05
VIE 5.30
YUE 5.57

Japanese is indeed the fastest, at 8.03 syllables per second, and Mandarin (CMN) is fourth slowest, after Cantonese (Yue), Vietnamese and Thai.

Information density was calculated by taking several texts and having them translated into the various languages (by several texts and speakers) from English, and basically measuring the average length of the translated text — the longer the text the less dense the information density.

For an easier read, I recommend Pellegrino, F., Coupé, C., and Marsico, E. 2011. “Across- Language Perspective on Speech Information Rate.” Language 87, no. 3: 539– 558.
Philip Taylor said,

April 9, 2024 @ 1:38 pm

Ttwo comments / questions on the article itself rather than on the topics it discusses —

1) « Just by using “A” in different places we can get the vowel sounds in “cat,” “can,” “cane,” “calm,” and a bunch more ». In which dialect(s) / topolect(s) / w-h-y do the vowel sounds of "cat" and "can" differ ?

2) « “shouldn’t” and “should not” are both two syllables » — for me, “shouldn’t” has three syllables (/ˈʃʊd·ən·t/) while "should not" has only two (/ ʃʊd·nɒt/).
J.W. Brewer said,

April 9, 2024 @ 1:51 pm

I'm confused by what Jerry Packard means by "measuring the average length of the translated text." Total character count of the text in writing? I FWIW have open on my other computer screen right now the ready-to-be-typeset manuscript of a forthcoming English translation (with facing-page original text) of a medieval theological treatise in Greek. Unsurprisingly, the Greek text has fewer but longer words and the English text has more but shorter words. The English does come out a bit longer overall, e.g. one paragraph that is 19 lines long in Greek is 22 lines long in English, which is consistent with the notion that the Greek is a bit more information-dense (and this is a very dense and "academic" register of Byzantine Greek). But of course either language could have a different orthography than it does which could make a given text longer or shorter in character-count terms – consider how transliterated Greek turns certain single characters of the Greek alphabet into digraphs when romanized. But the possible distorting effect of those varying orthographic conventions should have nothing to do with the information-density of the language-as-such, should they?
Philip Taylor said,

April 9, 2024 @ 1:56 pm

I think that J.M.G.N’s seemingly broken link to Google Books (Spanish edition) may have been intended to display the following text accompanied by an illustration —

Figure 5.15 Native Spanish speaker's pitch contour (top) is more dynamic than Spanish learner's pitch contour (bottom).
First, Figure 5 .15 indicates that pitch varies much 1nore for the native speaker than for the learner. Also, loudness appears to vary more for the native speaker, based on the lighter gray lines that show amplitude. Thus, on two suprasegmental dimensions the native speaker's utterance appears more dynamic than the learner's. If the native speaker's utterance were actually produced in "staccato" style, as suggested in many textbooks, we would expect to see very short syllables separated by sizeable gaps. After all, in music the term staccato is relatively precise, meaning very brief notes separated by longer-than-usual spaces. Yet, looking at the utterance through the spectrographic displays in Figures 5 .16 and 5 .17, we do not immediately see a pattern of uniformly short syllables with large gaps between them. Since the notion of Spanish syllables as staccato comes from the syllable-timed hypothesis, a detailed analysis of the duration of the syllables may shed light on why this notion is not in evidence.
To compare the rhythm of the native speaker with that of the learner, we measured the duration of each syllable and of the total utterance in ms and calculated the percentage of total utterance duration contributed by each syllable. The results, shown in Table 5 .1 (along with the average percentage and the standard deviation), indicate more variation in syllable duration for the learner than for the native speaker, with standard deviations of</blockquote<
Jerry Packard said,

April 9, 2024 @ 2:47 pm

@J.W. Brewer I'm confused by what Jerry Packard means by "measuring the average length of the translated text."

I think they use length in syllables (# of syllables) as their measurement unit.
Jarek Weckwerth said,

April 9, 2024 @ 4:00 pm

@ Cervantes: The different versions of each vowel you detect are allophones. They cannot distinguish words. When the others are talking about five vowels, they mean phonemes. These do distinguish words. This is a fundamental idea in phonology. If you're interested, you can start from Wikipedia; the article on allophone is of a generally reasonable quality.
J.W. Brewer said,

April 9, 2024 @ 4:09 pm

@Jerry Packard. That may make more sense, assuming a stable and uncontroversial-to-apply-when-counting cross-linguistic definition of "syllable." Of course a one-character-per-syllable writing system would make it all the same thing, but such systems seem more prevalent with low-information-load-per-syllable languages, like Japanese. (Okay, okay, I've been told that in order to understand how kana "really" work you need to understand that a syllable is not exactly the same thing as a mora, but I've learned what morae are in this context a whole bunch of times over the course of my life and then forgotten each time.)
Julian said,

April 9, 2024 @ 5:03 pm

"I was astonished by how many particles, verbal forms, and other linguistic devices the language had for expressing reservation, reconsideration, uncertainty, indecisiveness, hedging one's bets, waffling, probability, surmise, and so on and so forth"
– this puts me in mind of the fact that my favourite word in English is "just". Very common, but I think quite hard to give a dictionary definition of.
Peter Taylor said,

April 9, 2024 @ 5:26 pm

I can attempt the Shannon information theory explanation, but I make no guarantee of success.

Let us suppose a language with 16 consonants and 4 vowels in which every syllable consists of an onset and a nucleus. There are therefore 16 × 4 = 64 possible syllables. If you had to encode them in binary then, because 64 is the sixth power of two, the obvious approach would take six binary digits (6 bits).

But does the obvious approach really encapsulate the amount of information? It depends. If all of the syllables are equally likely, then yes. But suppose BA occurs four times more often than each of KA, KE, KI. If we arrange the binary representations so that BA, KA, KE, KI have the first four bits in common (e.g. 011000, 011001, 011010, 011011), we can easily reorganise things to give BA a five-bit representation (01100) and the other three a seven-bit representation each (0110100, 0110101, 0110110). The average length of a long text will decrease because 4 × 6 + 6 + 6 + 6 is greater than 4 × 5 + 7 + 7 + 7. The general principle is that symbols which occur more frequently convey less information.

For toy examples it's straightforward to work with binary representations which have whole numbers of bits and can be distinguished unambiguously by reading them left to right (prefix codes). But it's possible to come up with schemes which let you give fractional numbers of bits to a symbol. If each symbol is given its optimal fractional weight bearing in mind the statistics of the language then we can average those weights to obtain the Shannon information density of a symbol.
KWillets said,

April 9, 2024 @ 6:28 pm

"Shannon's stuff" simply says that one second of speech has about 2^39 possible outcomes regardless of language, and that in one second on average a Japanese speaker chooses between 30 possibilities for the next syllable 8.03 times, while an English speaker chooses between 70 possibilities 6.34 times.
Judge said,

April 9, 2024 @ 8:33 pm

@Philip Taylor

Most North American accents will raise the /æ/ vowel before nasal final consonants to something like /ɛə/ (or /eɪ/ before /ŋ/).

Due to that, bat, ban, and bank all have different vowels.
Judge said,

April 9, 2024 @ 8:36 pm

oh and here's the Wikipedia page about it

https://en.wikipedia.org/wiki/%2F%C3%A6%2F_raising?wprov=sfla1
Philip Taylor said,

April 10, 2024 @ 1:09 am

Thank you for both replies, Judge. The Wikipedia page appears to assume that there is a TRAP/BATH merger, which is not the case in my (Southern British English) topolect :

the TRAP/BATH vowel (found in such words as ash, bath, man, lamp, pal, rag, sack, trap, etc.)

("bath" is the only exception in that list, all of the others being a pure /æ/ sound while "bath" has the same vowel sound as non-rhotic "car" — /ɑː/).
cliff arroyo said,

April 10, 2024 @ 2:17 am

Interesting that so far no mention of Malayalam (Dravidian language closely related to Tamil).

It's always impressed me with what seems like a very high syllable per minute rate… my subjective impression that it has more syllables per minute than the other major Dravidian languages (which overall seem faster than Northern Indian languages).

https://www.youtube.com/watch?v=rs3BE-f0LS4
Harry R said,

April 10, 2024 @ 2:34 am

The difference in information density between Japanese and English syllables is why 5-7-5 haiku don’t really work in English — you can be way too verbose in 12 syllables of English.
Chris Button said,

April 10, 2024 @ 5:32 am

@ Harry R

The main issue with haiku in English is that the "mora" count in Japanese is reinterpreted/misinterpreted in English as a "syllable" count. For example, a three mora word like "kanji" (ka.n.ji) would count as a two syllable word in English (kan.ji)
Cervantes said,

April 10, 2024 @ 7:37 am

Regarding allophones, I understand the idea but in order to speak the language intelligibly, or at least not sound bizarre, you need to produce the right sound in the right place. That seems to me to be what's relevant in this context.

The "e" in esta and está is indeed pronounced differently and does distinguish the words. If you were to pronounce either word with even stress the listener would distinguish them by the e sound, and if you were to use the wrong sound for the stress on the syllable it would create ambiguity.
Peter Taylor said,

April 10, 2024 @ 7:48 am

@Cervantes, I find that really interesting. As an L2 speaker I'm pretty sure I focus more on the a to distinguish them. It reminds me of the anecdote someone mentioned on LL once about the foreigner in Beijing who found themself serving as an interpreter between Chinese speakers of Mandarin who couldn't understand each others' accents, with the implication that the L2 speaker used different features to distinguish the words.
Chas Belov said,

April 10, 2024 @ 11:38 am

My reaction to seeing Cantonese as one of the three slowest was ¿haih mē? 係咩 (¿really?) Actually, my reaction to 咩 having the phonetic part 羊 yang was also 係咩.

I wonder how much of the recorded slowness of Cantonese is attributable to the habit of stretching out the last syllable of a sentence in casual speech. When overhearing Cantonese speakers in public, they don't seem to speak noticeably (to me) slowly, but that stretching out is quite common among older speakers. I don't hear enough younger people speaking Cantonese to make a generalization concerning younger Cantonese speakers.
Jim Breen said,

April 10, 2024 @ 5:21 pm

Victor made some comments about Japanese, which concluded with "For me, indeterminateness came to constitute the genius of Japanese language. Simple declarative sentences are not their forte."

In his excellent "Making Sense of Japanese", Jay Rubin discusses this very point. His view, which I support, is that it's not the language itself that leads to "indeterminateness" but the cultural baggage brought to it by Japanese people. It's quite possible to be precise and blunt in Japanese; just most people avoid doing so.
Thomas said,

April 11, 2024 @ 12:07 am

Watching the series Shogun with only a superficial understanding of the Japanese language, I am astonished that every other sentence ends with ございまする (gozaimasuru). While the English subtitles are short, the characters can ramble on for quite some time. No wonder they have to rush when taking: They do have to cram in a lot of dead weight.
Philip Taylor said,

April 11, 2024 @ 2:52 am

« [Jay Rubin]’s view is that it's not the language itself that leads to "indeterminateness" but the cultural baggage brought to it by Japanese people. It's quite possible to be precise and blunt in Japanese; just most people avoid doing so. » — As one who has had the misfortune to (a) have deeply offended an elderly Japanese lady by inadvertently trespassing on her property, and (b) witness the reaction of an Imperial Guard in the Imperial Gardens in Kyoto when my Chinese teacher had the misfortune to step six inches over an invisible line into the "Emperor’s Territory", I can say with 100% certainty that the Japanese language is capable of being very "precise and blunt" indeed. But I would not agree that the perceived "indeterminacy" of the Japanese language is a result of cultural baggage; "baggage" has markedly negative conotations , whereas I (like Victor) regard it more as "genius" — the innate ability of the average Japanese to seek to avoid disharmony at all costs …
Terry K. said,

April 11, 2024 @ 9:46 am

Although there may be an allophonic different I'm not noticing, I'm North American, U.S., with a pretty standard accent, and I do not at all think of the vowels of "cat" and "can" as different, seeing them in a list of words.

Although, it does now occur to me that "can" in "you can do it if you want to" has a reduced vowel that differs from "can" in "a can of soup" or "I CAN do it". But that can of soup has the same vowel as cat.
TR said,

April 11, 2024 @ 1:04 pm

I haven't read Pellegrino's work, but it's far from obvious to me that the idea of measuring "the speed of a language" even makes sense given the variability of speech speeds across speakers and situations. It seems likely that among English speakers alone this variation would exceed the inter-language variation reported in the paper. Does he try to get a representative sample of speakers and settings for each language measured?
Jarek Weckwerth said,

April 11, 2024 @ 3:20 pm

@ Cervantes: I think esta and está are distiguished mainly by the stress. If the [e]'s are indeed different allophonically (I'm not sufficiently familiar with Spanish to argue a strong case here; but it seems very likely), then replacing one with the other will produce the impression of an untypical ("strange") realization but I would doubt it would impart a perception of the other word. That's the usual approach with minimal pairs. Having the two words monotonized (with "even stress" as you say) may be a different matter, but that is not what happens in normal speech.

(Think IMport vs. imPORT in English. I would suspect you could also tell them apart if monotonized. Do they contain the same vowels? It's the perennial question of what counts as "the same" in phonology.)

Also, the two allophones are predictable from the context. Phonemes aren't generally predictable.

@ Philip Taylor et al. on the raised/diphthongized TRAP in American English: I find it very noticeable. My standard example is the word fantastic from the Oxford Learner's Dictionary. You can clearly hear how the British speaker has two very similar qualities while the American speaker has a noticeably raised quality before the nasal in the first syllable. But you can try any other pair with TRAP+nasal and TRAP+sth-else, e.g. man vs. mad or dam vs. dad.
RfP said,

April 11, 2024 @ 3:36 pm

@Terry K.

I also have a pretty standard U.S. accent, and had the same reaction as you did to this notion of a distinction between the vowels in “cat” and “can.”

But then I realized that I’ve known many people who grew up—as I did—in Northern California who pronounce “can” like “ken.”

I’m only half joking when I wonder if there’s a can/ken merger.
Jerry Packard said,

April 11, 2024 @ 4:08 pm

@TR
If you look at the speech rate numbers, and then how they are graphed in Fig. 1 of the paper, one can see that there are real differences in speech rate across languages. Some of the differences may not be significant (e.g., 5.86 for Mandarin and 5.87 for Hungarian), but for the most part the differences appear significant and real. The study used several different types of language materials and several different speakers for each language. It’s hard to imagine how those results could have been obtained if they were not really present.
Jarek Weckwerth said,

April 11, 2024 @ 4:11 pm

@ RfP: The "nasal TRAP" system is claimed to occur throughout North America, and the only obvious cases where it doesn't apply are those accents where TRAP is raised everywhere, such as the so-called Northern Cities. (But even there the nasal TRAP system is making inroads; there have been recent reports of Detroit reverting from "raised TRAP across the board" to "pan-American nasal TRAP".)

If you try, say, the Merriam-Webster recordings for can and cat, what is your impression? Which dialect area are you from?
RfP said,

April 11, 2024 @ 4:36 pm

@Jarek

I’m in way over my head on this stuff (ironically enough, since I was really into phonetics and phonology for a while in college), but I grew up in the San Francisco Bay Area with the 1960s “TV news broadcaster” accent.

When I listen to the Merriam-Webster recordings, I don’t hear a significant difference, but I do notice a slight bunching of my tongue as I’m saying the “a” in “can,” and that bunching doesn’t occur AFAICT when I say “cat.”

Funnily enough, I happened to notice earlier today that I seem to articulate the first “y” in “polytomy,” a word which is new to me and which came up in another LL thread, a bit lower than the first “y” in “polygraphy.”
Jarek Weckwerth said,

April 11, 2024 @ 5:02 pm

@ RfP: OK, my bad. Those Merriam-Webster clips are noticeably different for me. But that's because I'm much more familiar with British accents where the difference would essentially be zero. (Well, perhaps outside of old-style Cockney.) Try Oxford Learner's and compare the blue speaker (British) with the red speaker (American): can, cat.
TR said,

April 12, 2024 @ 12:30 pm

@Jerry Packard

"The rate at which ten college students performed an artificial reading task" doesn't strike me as a good definition for "the speed of a language", though to be fair to Pellegrino et al. the latter term seems to be the journalist's framing rather than theirs. It seems likely that the IR values they found for any of the widely spoken languages would be significantly different if they used subjects of a different age, cultural background or geographical origin than the ones they happened to use (the Spanish of Madrid is not the Spanish of Iquitos, and the English of New York City is not the English of Appalachia).

We'd expect that speakers of languages with a simpler syllable structure will on the whole produce more syllables per unit of time, but it's not clear to me why syllable rate, or speech rate at all, should be a useful measure of information density — wouldn't it make more sense to abstract away from variation by measuring bits per phoneme?
Jerry Packard said,

April 12, 2024 @ 10:15 pm

TR,

There are many factors that affect speech rate. In your experiment you can either let those factors vary randomly, or you can control them by restricting your subject pool, or by measuring those factors in your subject pool and entering those values as part of the statistical analysis. It looks like the authors used the ‘restrict subject pool’ method by using just college students. If you think certain factors (e.g., text genre or formality, or subject age or background or geographical origin) are of particular interest, you control for them by including them in your experiment design.

The authors did not use SR to measure information density, information density is measured by the length of translated texts.
Julian said,

April 13, 2024 @ 2:17 am

Hypothesis: the rate of information transfer in speech has evolved (in the Darwinian sense**) to be the best compromise between the speaker's need to minimise the work of speaking, the listener's need to minimise the work of interpreting, and the need for efficient communication.
In that case, wouldn't you expect it to be a human universal that is similar in all languages?
** Since being able to warn your children of danger would improve your reproductive success.
Jerry Packard said,

April 13, 2024 @ 7:12 am

Julian,
Yes indeed.
TR said,

April 13, 2024 @ 11:25 am

I should have said "information rate", not "information density". Restricting your subject pool to a sample that isn't representative of the overall population obviously means you have no good basis for making conclusions about that population.
Jerry Packard said,

April 13, 2024 @ 1:21 pm

TR, the authors assume that their sample is representative of the larger population when it comes to speech rate and information density. When you do experiments like this to test hypotheses about language production and processing, you try to have a homogeneous subject population to reduce experimental ‘noise’ (error variance).

For us to assume that their sample isn't representative of the overall population we’d have to assume that there is some factor that makes speech rate vary in the population tested. If you suspect that other factors affect speech rate – like the Spanish of Madrid vs. the Spanish of Iquitos – then you set up your subject pool with that in mind.
TR said,

April 13, 2024 @ 2:23 pm

But why would you assume that the sample is representative of the population of speakers, or that the experiment setup is representative of the variety of possible speech settings? It's an obvious fact that speech rate varies widely among speakers and situations. I don't know how you would set up a subject pool or an experimental setting that would accurately reflect that variation, but you can't just ignore it if you want to draw conclusions about actual language use.
Jarek Weckwerth said,

April 13, 2024 @ 4:04 pm

@TR: One of the hardcore approaches in sociolinguistics is/was trying to do real poll-style sampling on the basis of e.g. voter registers, with large samples. But for that, you need Labov calibre dedication, timescales, and grant potential. For experimental lab-based studies, what Jerry is describing is the norm, not only in linguistics but in social science and psychology in general. Hence the usual ridicule of this type of research: "Modern psychology = psychology of American college undergraduates".
Jerry Packard said,

April 14, 2024 @ 11:37 am

TR, there’s a lot of truth in what Jarek says – in these experimental studies there is a relatively small subject n for each condition, but a relatively large n for number of observations. One of Bill Labov’s most famous studies surveyed the presence or absence of ‘r’ (fourth floor > foth flo) in 3 different populations – shoppers in Saks Fifth Avenue, Macy's, and S. Klein. So Labov used the ‘restrict subject pool’ way of controlling variation.

You start with the null hypothesis that there is no difference in speech rate across languages and then measure to see whether any effect you can think of is strong enough to allow you to reject that null hypothesis.

Given the Pellegrino results, if you think those results do not reflect the reality, then you’d do your own experiment to see whether speech rate varies by language. Then you would include in your experiment those factors mentioned above — in addition to language — that you feel affect speech rate, using as many subjects as you can afford to test. You can include any factors you like from the variety of possible speech settings that you can think of, and you would either measure them, or include them as a binary choice – like formal/informal.

If we performed that experiment, I would bet (based on the Pellegrino results) that languages vary significantly in speech rate, independent of gender, class, genre or any other factor you might think of.
Chris said,

May 3, 2024 @ 8:31 pm

Interesting southern drawl comparison. You can hear this for real in Japanese enka music or Noh drama.

RSS feed for comments on this post

Fast talking

52 Comments

Chris Button said,

chris Button said,

Colin Watson said,

Chris Button said,

Cervantes said,

J.M.G.N. said,

J.M.G.N said,

David L said,

Cervantes said,

anhweol said,

Chris Button said,

Cervantes said,

Jerry Packard said,

Philip Taylor said,

J.W. Brewer said,

Philip Taylor said,

Jerry Packard said,

Jarek Weckwerth said,

J.W. Brewer said,

Julian said,

Peter Taylor said,

KWillets said,

Judge said,

Judge said,

Philip Taylor said,

cliff arroyo said,

Harry R said,

Chris Button said,

Cervantes said,

Peter Taylor said,

Chas Belov said,

Jim Breen said,

Thomas said,

Philip Taylor said,

Terry K. said,

TR said,

Jarek Weckwerth said,

RfP said,

Jerry Packard said,

Jarek Weckwerth said,

RfP said,

Jarek Weckwerth said,

TR said,

Jerry Packard said,

Julian said,

Jerry Packard said,

TR said,

Jerry Packard said,

TR said,

Jarek Weckwerth said,

Jerry Packard said,

Chris said,

Follow us on Twitter

Archives [+/–]

Blogroll [+/–]

Meta