Prosody and language identification

« previous post | next post »

I believe — without much evidence — that I can recognize many more languages than I can understand. This ability depends to some extent on recognizing an occasional common word, and on a sort of textural appreciation of syllable structures and certain characteristic sounds. But some of it, I've always believed, is prosodic: perception of time patterns of pitch and amplitude.

Do you think you can recognize a language from its pitch and amplitude pattern? Here's an example, created according to the recipe that I described in an earlier post. I'll give the answer (and the original recording) tomorrow.

My belief in my own ability is based entirely on anecdotal evidence, of a kind that is subject to confirmation bias and other kinds of error. A few months ago, on a plane across the Atlantic, I concluded that a father and son in the seats behind me were speaking Danish, based on dimly heard textural and prosodic evidence. Then they raised their voices a bit, and I learned that they were speaking English, of the kind found in the city of York. (Of course, I took that as evidence of the prosodic survival of the Danelaw in Jórvík, rather than as evidence of of unreliable perception on my part…)

It's plausible that people can recognize languages that they don't know, and there have even been some experimental tests. Thus Y.K. Muthusamy et al., "Perceptual Benchmarks for Automatic Language Identification", ICASSP 1994, found that native English speakers, given a short training session with 9 languages, were able to identify samples with durations of 1, 2, 4 and 6 seconds at rates of 20.7%, 37.4%, 45.8% and 49.7%.  This is not terrific performance, but it is still better than chance (which would be 11.1%). Some languages were harder than others — thus subjects recognized Korean correctly only 13.5% of the time in the first quarter of the experiment, and despite feedback after each trial, this only increase to 16.7% in the last quarter of the experiment.

Muthuswamy's subjects were listening to original, full-band audio of good quality. There are also ways to present altered or re-synthesized speech in order tease apart the cues for language identification and discrimination, as discussed in Franck Ramus and Jacques Mehler, "Language identification with suprasegmental cues: A study based on speech resynthesis", JASA 105(1): 512-521, 1999. Tests of this sort show that prosodic features do play a role, though there are some surprises. Thus Masahiko Komatsu et al., "Perceptual discrimination of prosodic types", SP-2004, found that subjects were able to distinguish English from Chinese in a pairwise discrimination test only about 70% of the time, when given pitch and amplitude information only. (On each trial, subjects listened to a language pair and were asked to decide whether the order was E-C or C-E.)

Of course, these were just random students recruited for a perceptual experiment, and there are no doubt large individual differences in abilities of this sort.

[If you're following along at home, note that the resynthesis technique that I used is different from that in the two studies just cited: thus Komatsu et al. started by stylizing the pitch contours (which I didn't do), and then resynthesized using various combinations of white noise and pulse trains, rather than an instrument with five overtones with 1/F amplitudes. More on all this later…]



19 Comments

  1. David Marjanović said,

    June 15, 2008 @ 9:45 am

    Pitch is a major factor when I try to distinguish Sinitic languages from Korean by listening to tourists, but that's a bad example, because Korean lacks tones… (And of course Korean has a [r], but it's less common than the very conspicuous aspirated affricates that Korean shares with "Chinese" and with little else in Eurasia.) Of all these, I only speak very little Mandarin.

    Interestingly, I've been told my intonation of French is English, by someone who wasn't otherwise able to place my accent. I haven't found out what precisely is going on, I must be overcompensating for the fact that my kind of German (like most others) has a rather small pitch range. Some Englishes have such a large pitch range that I sometimes wonder if I'd believe English is a tone language if I didn't know better.

  2. jm said,

    June 15, 2008 @ 11:51 am

    I have been struck several times by how much Korean sounds like Japanese* to me when I hear it at low volume or in the presence of background noise such that intelligibility would be borderline were it Japanese.

    Although the two languages have very different vocabularies, with hardly any obvious cognates and significant differences in phonetics, their close relationship is obvious from their highly similar grammar and syntax.

    * In which I'm fairly fluent — regarding Korean, I've read enough of Korean language textbooks to be aware of the vocabulary differences and grammatical/syntactical similarities.

  3. bulbul said,

    June 15, 2008 @ 12:35 pm

    That's funny, I never thought Japanese and Korean sounded similar, not even back when the only contact I had with those languages was Neon Genesis Evangelion and couple of Korean action movies, respectively. The Korean tense consonants always gave it away and so did the falling-rising intonation of verbal suffixes, especially of the -yo conjugation.

  4. jm said,

    June 15, 2008 @ 1:05 pm

    bulbul,

    You seem to be describing what one hears when the volume level is well above the intelligibility limit. I was describing what I hear at volume or noise levels such that if the language were Japanese I might just barely be able to catch an occasional phrase.

  5. Sarah E. said,

    June 15, 2008 @ 2:45 pm

    I'm slightly more attuned to Japanese than Korean, so I think I could use another pair of ears here, but this interview video (in Korean: http://www.youtube.com/watch?v=oNiKx0g1G8s&feature=related) could be easily mistaken for Japanese if one listens only for pitch and rhythm:
    – falling intonation Korean "-seyo" and the Japanese "-desuyo"
    – nearly identical syllable speed
    – sustained pitches in words that are emphasized or require a bit more clarity (esp. foreign loan words)
    – higher pitches at the end of a phrase, and a very low pitch at the end of a sentence

    I think it might be interesting to also similarly contrast the rhythm and pitch patterns among English dialects. I've noticed that when I watch UK film and television, it can sound like gibberish if I'm tired and not listening with a "British" ear.

  6. Moira Less said,

    June 15, 2008 @ 2:53 pm

    Living in a place in Norway where there are lots of other foreigners I can easily distinguish English–spoken at some distance away from me, so I can't hear what's being said–from Norwegian and other languages.

    Yorkshire dialect has some obvious similarities to the way people speak in N. Germany (nay meaning no, etc.), and when I lived in Hamburg I found that assuming a Yorkshire accent was a big help to improving my spoken German.

    Anyone who sounds to Anglo ears vaguely Danish from a distance could be a drunk Norwegian. Seriously. (Sorry Danes.)

  7. mollymooly said,

    June 15, 2008 @ 4:11 pm

    Knowing no Scandinavian languages, in Denmark I thought every group of Danes chatting in the distance was anglophone till they came close by.

  8. Craig Russell said,

    June 15, 2008 @ 4:13 pm

    Is no one interested in speculating about the original language of the sound sample?

    I have zero experience with Asian (or African–though it would be pushing it a bit, I think, to expect an English-speaking listener to readily identify an African language) languages, so if it's one of those, I'm out of luck. But if I were to limit it to languages I have slightly more familiarity with, I would say I might hear something of the rhythmic ups and downs and punches I associate with Italian?

    Perhaps the exercise would be easier if it were like the last one: three samples from three different languages, where we're told what the three possibilities are and have to match them up.

  9. Bob Ladd said,

    June 15, 2008 @ 4:35 pm

    @Craig Russell: I'd be very surprised if it was Italian. If I had to guess from a list of relatively familiar languages, I'd be more inclined to say French. The tune of the last three syllables could certainly be French, and the recurring high-pitch syllables are also sort of consistent with French, though they don't really sound long enough. You're definitely right that it would be easier if we had three samples and three correct answers to match up.

  10. Moira Less said,

    June 15, 2008 @ 4:45 pm

    Ok, well I got all three wrong last time so I didn't want to say anything, but I agree it sounds Italian. It could be Scandinavian. It's easier to tell what it isn't (an Australian).

  11. Chas said,

    June 15, 2008 @ 5:10 pm

    I speak a little Cantonese and Spanish, and live in San Francisco, where both are commonly heard on the street. Many other languages may be overheard here, but those are the two most common.

    I find pick up pitch and prosody very rapidly for both Cantonese and Tagalog, not so much for other languages. I'm sure it takes me more than 6 seconds for languages other than Cantonese, Spanish, Tagalog, or Mandarin. It can take me up to a minute to distinguish between Japanese and Korean (and sometimes I'm not even sure I'm right) or between Mandarin and Shanghainese or between Mandarin and Min.

    What can complicate things is that I'm typically overhearing conversations, and the two people involved in the conversation may not be from the same place, or even speaking the same language. Then I have to keep track of what I heard from each person.

    I once overheard a conversation–at at Thai restaurant of all places–in which one person was speaking Chinese and the other Japanese. I'd guess they were a couple and that both understood both languages, but felt most comfortable speaking their own.

    That's an entirely separate issue from code switching. I once overheard a conversation in which I was hearing what I could swear was Spanish-accented Cantonese. Suddenly, they started jumping between Spanish, Cantonese, and English. Listening made me feel dizzy, literally.

  12. Sarah E. said,

    June 15, 2008 @ 6:14 pm

    @Bob – I'm also inclined to say French.

  13. Kenny Easwaran said,

    June 15, 2008 @ 9:07 pm

    Mollymooly – I had the same experience when I was in Amsterdam. I kept hearing people talk to each other and thought they must be British tourists, but then they got closer and I realized they were speaking Dutch.

  14. Martyn Cornell said,

    June 16, 2008 @ 6:21 am

    Portuguese (Portuguese Portuguese, that is, not a Brazilian-accented variety) always sounds Slavic to me … (I"ve studied French, German and Latin, but not Portuguese not any Slavic langiuage)

  15. Moira Less said,

    June 16, 2008 @ 7:33 am

    Martyn Cornell said, Portuguese always sounds Slavic to me…

    Me too. I took Russian at school and Portuguese sounds very Russian to me.

  16. Chad Nilep said,

    June 16, 2008 @ 11:17 am

    I'm reminded of Murders in the Rue Morgue by Edgar Allan Poe. In that story, witnesses agree that they heard unintelligible screaming, but disagree about the language used: the Englishman thinks it was Italian, the Spaniard thinks it was Russian, etc.

    Spoiler alert:

    The screamer turns out to have been an orangutan.

  17. Ralph Hickok said,

    June 16, 2008 @ 1:09 pm

    Martyn Cornell said, Portuguese always sounds Slavic to me. . .

    I grew up in Wisconsin, where I frequently heard a Polish-language radio station. For more than 40 years, I've lived in New Bedford, Massachusetts, which has a large Portuguese-American population. The first time I heard the local Portuguese-language radio station, I was sure they were speaking Polish.

  18. Kellen said,

    June 16, 2008 @ 11:32 pm

    re: Martyn Cornell, that's pretty funny to hear since i just recently overheard my serbian friend on the phone to her mother and was sure for a second that they were speaking brasilian portuguese with a hint of northern albanian.

    i live in china and often hear korean spoken from a pretty massive population here in nanjing. i can't say i'd ever mix it up with japanese, even before i was too familiar with it. i'd probably think it was some dialect of chinese (again, before i became familiar with either).

    there's also a large german population here, which i keep thinking is speaking english until i get closer. but that might just be because i see white people and expect english. i really should know better by now.

  19. doviende said,

    June 18, 2008 @ 9:09 am

    i speak a decent amount of mandarin, so i recognize that instantly, and i have beginner / survival knowledge of japanese so i recognize that pretty quick….but i have to say that i agree with jm that many times when i hear korean speakers, my first reaction is "hey! that sounds like japanese, but somehow i can't understand any of it.", and then suddenly i hear some really korean-sounding syllable and i'm jolted back to reality.

    it also reminds me of the couple of times that i've heard swiss-german speakers, and thought "wow…it really sounds like they're speaking german, but somehow it's incomprehensible" (i have good survival german skills, but none in swiss-german). Of course, it's much more reasonable (in my mind) that swiss-german should sound so much like german ;)

RSS feed for comments on this post