Can you tell the difference between English and Chinese?

« previous post | next post »

… from the pitch contours alone? It should be easy, right? Chinese is a tone language, English isn't, etc.

So try it…

I've chosen phrases at random from a published collection of Mandarin Broadcast News and a similar collection of English Broadcast News. Each audio file has been lowpass filtered via the command

sox –norm $f  low_"$f" sinc -300

This is not the only way to remove "segmental" information (vowels and consonants) while preserving prosodic information (pitch and timing), and it's probably not the best way to do this, but it's an easy and  commonly-used method.

There are eight audio clips. Listen to them and note for each one whether you think it's English or Chinese. I'd suggest using headphones if you don't have a good sound system connected to your computer, because the low frequencies involved are not well reproduced by small laptop speakers. This is a "forced choice" experiment, so give a clear answer ("English" or "Chinese") in each case. Don't overthink or overanalyze or go over the material multiple times — just listen once and say what comes to mind. If you're not sure, just guess.

OK, we've got 50-odd responses — that's enough for this simple and inadequately designed experiment… 

Along with each answer, give a number between 1 and 3 indicating how confident you are in your judgment, where 1 means "no clue, I'm just guessing", 2 means "I think it's X but I'm not sure", and 3 means "no question at all, it's X".

(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)

Here are the original files:

(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)

 

Overall results:

1 2 3 4 5 6 7 8
Truth E E C C E C C E
No. who guessed C  7  30  64  55 13  60 54  9
No. who guessed E  60  37  3  12  54  7  13  58
Percent Correct  89.5  55.2  95.5  82.1  80.6  89.6  80.6 86.6

Overall percent correct = 82.7%. Detailed responses are here.

A few preliminary comments:

  • Although many people commented that they found the task hard, the overall performance was pretty good. Still, it was far from perfect, and at least one of the clips was quite ambiguous.
  • Clips 4 and 7 were the same — this was a screw-up in the script I used to prepare the data — sorry!
  • Individual phrases were variable enough to merit a larger experiment to try to figure out what factors are responsible for the variation in judgments.
  • Individual subjects were variable enough to merit ditto.
  • In a properly-done experiment, we'd want to randomize the order of presentation, for obvious reasons.
  • The world needs a good web-based tool for managing experiments of this type.

More later…  Thanks to those of you who got your answers in during the first hour or so…

Meanwhile, I'll open up comments again.



29 Comments

  1. Peter Meilstrup said,

    December 20, 2013 @ 1:18 pm

    Some researchers are using Amazon's Mechanical Turk as a platform for this kind of thing.

    http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0057410

    [(myl) Mechanical Turk is a great way to recruit and pay participants for certain kinds of experiments. But the platform doesn't (try to) do what I have in mind, which for simple acoustic perception experiments would include (for instance) randomizing stimuli and keeping track of per-subject stimulus order as well as stimulus/response tabulation. You could do all of that with a Mechanical Turk front end, but you'd have to program the back end (and much of the browser-based interactive code) yourself.

    There are several capable programs for doing this kind of thing in a non-distributed, non-browser-based way. What I'd like to see is something as easy to use as existing web-based polling systems, at least for simple experimental designs.

    You'd upload a set of stimuli (or better, URLs for the stimuli), and specify the Q&A for enrolling participants, the treatment of stimulus order, and the questions and answer-types for each stimulus presentation, without doing any javascript or back-end database programming. You could then harvest at will the results in tabular form. This could be provided as a service, or as a plug-in for (say) Drupal so that people could easily run it on their own web server.

    Maybe something like this exists — but if not, it shouldn't be very hard for someone "skilled in the art" to create at least a simple version.]

  2. Jongseong Park said,

    December 20, 2013 @ 1:26 pm

    1. C-1
    2. E-1
    3. C-2
    4. C-2
    5. E-2
    6. C-3
    7. C-3
    8. E-2

    I got the first clip wrong but the rest correct. I found it really hard at first, though I was getting the hang of it by the end.

    Perhaps because of the introductory paragraph, I really was paying attention to the pitch contours at first. It wasn't very helpful, I think because there is a fair bit of variety in the pitch contours in different accents of English. What helped me was to forget about the pitch contours and to listen to other aspects, like the rhythm.

    I speak and understand English but not Chinese, though I'm very familiar with the sounds of spoken Chinese. The fact that the first two clips were in English confused me, and only after I heard the Chinese clips did I get the hang of what the two languages sounded like muffled. Impressionistically, Chinese has a staccato sort of timing coupled with the pitch contours which sounds foreign to English. If I heard the first clip again, I probably would get it correct.

  3. Ralph said,

    December 20, 2013 @ 1:49 pm

    I would not yet read too much into the low success rate for the second clip until you have controlled by randomizing the order. There may be a bias toward choosing the other option for the second clip when two choices are available.

  4. Nathan said,

    December 20, 2013 @ 2:12 pm

    The numbers on #2 can't be right.

    [(myl) Scribal error — fixed now.]

  5. D Sky Onosson said,

    December 20, 2013 @ 2:13 pm

    1 English-3
    2 English-2
    3 Mandarin-2
    4 Mandarin-3
    5 English-2
    6 Mandarin-1
    7 Mandarin-3
    8 English-1

  6. Maria said,

    December 20, 2013 @ 2:14 pm

    I got the second wrong (It was also the only question I hesitated on and would give a 2) but the rest correct and would lean heavily towards noting them as 3s.. For most of the Chinese clips the rising tone seemed to be the major giveaway for me, and the cadence was also different enough for me to determine through that as well.

  7. Nik Berry said,

    December 20, 2013 @ 2:16 pm

    I impress me – got them all.

  8. Y said,

    December 20, 2013 @ 2:26 pm

    A few clues help. The Chinese has final glottal stops (e.g. 4, 7). Mandarin has rapid changes in pitch (e.g. 6), while English does not change pitch within a syllable, and usually not by much between neighboring ones. Some of the English intonation patterns are familiar broadcast clichés (e.g. 5).

  9. Bobbie said,

    December 20, 2013 @ 3:19 pm

    7 out of 8 correct. I agree with Jongseong Park — I was listening more to the rhythm than to the pitch contours.

  10. Keith Gaughan said,

    December 20, 2013 @ 3:24 pm

    I got them all correct, except for 2, which I incorrectly guessed to be Mandarin. I didn't find it particularly hard to pick any apart, but I'm guessing the reason I missed 2 was that it sounds much more staccato than I'd normally expect English to sound.

  11. Michael Becker said,

    December 20, 2013 @ 4:18 pm

    This is what I use for web-based experiments:
    https://github.com/tlozoot/experigen
    Not easy to use, I am afraid.

  12. Avinor said,

    December 20, 2013 @ 4:45 pm

    I didn't realize that 4 and 7 were the same and answered "Chinese, 2" for number 4 and "Chinese, 3" for number 7. I suspect that my confidence grew as I heard more samples. Order definitely matters.

    [(myl) Or randomness rules. 15 out of 67 respondents (in the batch that I entered) gave a different response category to #4 and #7: 8 people called #4 "Chinese" and #7 "English", and 7 called #4 "English" and #7 "Chinese".]

  13. Rubrick said,

    December 20, 2013 @ 5:21 pm

    A friend of mine (formerly a grad student under Lara Boroditsky) created a startup several years back with the goal of creating exactly the sort of platform you describe, targetting academic researchers. Originally it was to be a standalone product; then it morphed into something which would piggyback on Mechanical Turk. Unfortunately a suitable business model never materialized, and the project was eventually abandoned.

    [(myl) Academic researchers in psychophysics and phonetics and so on are not really a big enough, or rich enough, market to support much of an enterprise. You might be able to bridge somehow into language teaching applications. But a better plan would be to make it an open-source project…]

  14. Tom V said,

    December 20, 2013 @ 6:51 pm

    Missed clip #2
    I think #3 was definitely the easiest.
    My father was born in Haijou [?, Wade-Giles Haichou], Jiangsu province. He and my mother used to use Mandarin as a code to keep things secret from us kids, so I have some experience in listening to Mandarin that is half heard and totally incomprehensible.
    The final glottal stops are a useful clue.

  15. Edward Lindon said,

    December 20, 2013 @ 8:21 pm

    Could the putative perceived similarities have any connection with the rhythms and inflections of the "broadcast voice"? Would the results be the same if the sample were composed of daily or conversational speech?

  16. Jay Sekora said,

    December 20, 2013 @ 8:30 pm

    Got ’em all. Interestingly, I barely speak any Chinese (two years’ study a long time ago) but I was significantly more confident of the Chinese samples than of the English ones (although I did actually get the English ones right). I think I was actually making more use of stress and tempo patterns than pitch patterns; I wonder how a similar experiment with everything but volume vs. time thrown away would go.

  17. Mark Stephenson said,

    December 20, 2013 @ 9:31 pm

    Surprised myself by getting all right except #2 (which others also had trouble with). I wasn't at all sure about most of my answers.

  18. Mike said,

    December 20, 2013 @ 9:42 pm

    Got 7/8. my error was #2. For me it wasn't the pitch itself, but the speed of it and what seemed to be the regularity of it–Chinese pitch and contours were less varied and more pronounced. The English samples just sounded like relatively expressive speech.

  19. Cygil said,

    December 20, 2013 @ 10:44 pm

    Could the putative perceived similarities have any connection with the rhythms and inflections of the "broadcast voice"? Would the results be the same if the sample were composed of daily or conversational speech?

    Exactly. Newsreaderese is a bizarre dialect of English that, if you used in regular conversation, would immediately signal you as a madman.

  20. PaulB said,

    December 21, 2013 @ 4:40 am

    Native speaker of UK English with poor Chinese but 20 years of living in Taiwan — and alas, I might just as well have been guessing at random. But at least I was consistent on 4 and 7!

    1. English 3
    2. English 2
    3. Chinese 2
    4. English 2
    5. Chinese 1
    6. Chinese 2
    7. English 2
    8. Chinese 2

  21. Russell said,

    December 21, 2013 @ 8:17 am

    I thought they were all the Swedish Chef.

  22. Victor Mair said,

    December 21, 2013 @ 8:32 am

    Because I've had severe tinnitus (due to explosions close to my ears) since mid-January, 1968, I've grown accustomed to relying on lip-reading, pitch, stress, tone, accent, and so forth when I'm listening to people talk. It's uncanny how the modified clips prepared by Mark sound a lot like what I hear every day, especially when I'm in a noisy restaurant. I'm always grateful for quiet ambience and speakers who enunciate clearly.

  23. EricF said,

    December 21, 2013 @ 12:54 pm

    I had to go back to my e-mail to Mark to find what I had originally guessed, and on review I like several others missed number two. Upon listening to it again, I agree that it's probably the rhythmic pattern and not the tonal dynamics that made me guess Chinese.

  24. Edward said,

    December 21, 2013 @ 6:50 pm

    That was quite challenging. I was somewhat confident to begin with, then became unsure, and towards the end, had doubts towards all my answers. That manifested in answers 4 and 7, which I hadn't noticed were identical!

    1: E2
    2: E3
    3: C2
    4: C3
    5: C2
    6: C2
    7: E1
    8: E2

  25. Daniel said,

    December 21, 2013 @ 8:45 pm

    I found this difficult at first, but very interesting.
    Some specific pitches that were used helped me the most, rather than the pitch contours. I can't describe it accurately, as my sense of pitch isn't that good, but some of the pitches used in some parts of the Mandarin sections didn't "feel" like they were commonly used in English.
    I didn't have a clue with the regular conversation clips, so the pitches used may be a characteristic of "Newsreaderese" as previously mentioned.

  26. Anthony said,

    December 23, 2013 @ 2:26 pm

    I missed #2, which I was more confident in than many of the others. I don't speak Chinese nor have ever studied it, but I do sometimes hear Chinese conversations in restaurants (not just from the staff; I live in the Bay Area).

    I'd be hard-pressed to identify why I chose as I did, though.

  27. Erika said,

    December 26, 2013 @ 10:35 am

    I got 6 out of 8 correct, which I was disappointed in, seeing as how my husband is a native Mandarin speaker (I am am not) and I frequently hear muffled phone conversations from the other room when he talks to his parents. I missed #1 and #6, and as other commenters have mentioned above, I believe having the clips be in "newsreaderese" confused the issue. After hearing the first two clips (both English!) I defaulted to listening to rhythm instead of tone.

  28. Wentao said,

    December 31, 2013 @ 12:16 am

    Got 8 out of 8 but didn't realize #4 and #7 are the same – gave them 3C and 2C respectively. Apart from rhythm, which is quite a give away, the rising and falling tones are more pronounced in Mandarin.

    As is the case in English, "Newsreaderese" is quite different from normal speech in Chinese too. Especially Xinwen Lianbo, which I reckon is the source of some of the clips?

  29. Brendan O'Connor said,

    January 7, 2014 @ 2:02 am

    I managed to label 4 and 7 the same, but had no clue they were the same and gave different confidence scores. Test-retest reliability, indeed…

RSS feed for comments on this post