Voice recognition vs. Shandong accent

« previous post | next post »

The following video is very popular in China now:

This is hilarious!

Even if you don't understand Chinese, you'll be able to follow what's going on.  The driver is trying to enter a phone number by voice, but the automated (machine) operator mishears him.  It gets really funny when he tells the operator that a correction ( jiūzhèng 纠正) must be made, but the operator interprets that as jiǔliù 96.

At first the female passenger thinks it's funny and giggles, but the driver gets more and more exasperated and angry, and ends up cursing at the operator.

The driver has a tendency to mumble, as when he repeats bōhào 拨号 ("dial") near the beginning, and his tones are all over the place.  The problem of tones in daily, spoken Sinitic languages will be the subject of a forthcoming post.

Lesson to be learned:  if you're talking to a machine, especially if you have a Shandong accent, you'd better speak slowly and clearly.

Incidentally, the driver has become famous because of this video and has acquired the nickname of jiūzhèng gē 纠正哥 ("correction brother").

[h.t. Grace Wu]


  1. Z. S. said,

    March 1, 2015 @ 10:56 pm

    So who was the unfortunate person he was trying to dial, who now presumably has had to change their phone number?

    I think 281 330-8004 is available.

  2. maidhc said,

    March 2, 2015 @ 4:33 am

    The Asian version of the famous "two Scotsmen in a lift" sketch?

    Funny thing, when I look at the image at the top-level blog, I see my photo of a Chinese restaurant in San Francisco that was posted a little while back. When I come down into the entry it's gone though, and I see the taxi-driver.

    That gives me the opportunity to mention that since my photo was posted on this blog, it and some others that are adjacent to it in my Flickr album have received an amazing number of views, currently around 27,000. Up until yesterday I was getting 2000 to 3000 views per day. Yesterday it was only 150, so it is dying down.

    I presume they must have been posted somewhere else that attracted special attention, because they are not very different from the other 400-odd photos of Chinese restaurants I have posted, or the other 22,000 or so photos that can be found on the Chinese Restaurant Worldwide Documentation Project.

    Could be something to do with Chinese New Year, I suppose.

  3. maidhc said,

    March 2, 2015 @ 4:44 am

    As a followup, I noticed that when I went back to the top level, the image has arrows on it that allow people to click through my entire Flickr album (not just the subset of it that is getting all the views). Not that I'm complaining. Anyone who wants to look at my photos is welcome to.

    When you click through to the entry you get the taxi-driver as intended.

  4. Rubrick said,

    March 2, 2015 @ 5:41 am

    I think it would be a tremendous advance if all voice-command software could recognize when it was being sweared at and sheepishly respond that it was trying its best, but that AI simply wasn't nearly good enough yet. (Lord knows it would get plenty of practice.)

  5. Carol said,

    March 2, 2015 @ 7:37 am

    The only phrase my fairly new Camry can understand is "What is the weather in Seattle?" I live in California.

  6. flow said,

    March 2, 2015 @ 12:43 pm

    @maidhc i've seen this picture-swapping thing on more on one occasion on more than one blog. I thought it was because of a botched Flashplayer installation. BTW will the site feed ever come back? Then again, don't bother if repairing it means you have to talk to some software voice…

  7. Mark Mandel said,

    March 3, 2015 @ 2:28 am

    When I worked for Dragon Systems (speech recognition software) in the 1990s, one of the foreign languages we wanted to cover was Mandarin. Unfortunately, one of the first things our basic engine did was strip out pitch information. It turned out, though, that we could do pretty well with duration: third tone (ma˨˩˦ ma214) was the longest, fourth tone (ma˥˩ ma51) the shortest, and first (ma˥ ma55) and second (ma˧˥ ma35) in between.

    (BTW, what's the "X" at the beginning of some of the captions? "****"?)

  8. Matt Anderson said,

    March 3, 2015 @ 8:39 am

    Mark Mandel—

    That "X" isn't censorship or anything—it just represents a syllable in a colorful phrase which no one knows is sure how to type. It's used for "xie" in the word "xie死", a Shandong dialect word meaning something like 'beat' or 'kill', or maybe, in this context, 'go to hell'. There seems to be a lot of disagreement over how to write this syllable—I've seen xiē 楔 (MSM pronunciation xiē), 携 (xié), and 锤 (chuí) suggested, and I think there are others—romanized "xie" is probably the best bet.

    And this video is hilarious!

  9. Mark Mandel said,

    March 3, 2015 @ 1:29 pm

    Thanks, Matt

RSS feed for comments on this post