Language Log

White dude challenges Chinese speakers in Shanghai

May 5, 2017 @ 12:00 pm · Filed by Victor Mair under Language teaching and learning, Pronunciation, Writing systems

Jayme, his gangling arms covered with colorful tattoos, sallies forth onto Nanjing Road, the busiest shopping street in Shanghai, and tests the local denizens and tourists on their language skills (reading, writing, and pronunciation):

Jayme catches them on one "error" after another (very few respond to his questions correctly). Some people who have seen this video assume that, since it was shot in Shanghai, the low rate of correct answers is due to interference from Wu, the local topolect. But this is not the case, since the respondents — no matter what part of China they come from — speak to Jayme in Putonghua / Modern Standard Mandarin (MSM), albeit with the "errors" that he is quick to point out.

So what's really the story here? Jayme's Chinese is better than theirs?

I asked several of my students and colleagues from the Mainland and Taiwan to explain why the people Jayme interviews make so many "mistakes". Here are some of their responses:

1.
I don't think the dialect is the main reason for the errors as mentioned by the man in the video. One main reason is the characters shown in the beginning, like xiāo / xiào 肖 and xiě / xuè 血 have more than one reading. So it is very easy to make this type of error. The other reason is the traditional character, which was rarely used today in Mainland. So people are not very familiar with them.

2.
The wrong pronunciations of these native Chinese speakers normally make sense, even if they are wrong. And I don't think dialects play the determining factor in making the errors.<

Take 肖像 ("portrait") as an example. Most people, whatever their origins, uttered the phrase as "xiāo xiàng", instead of the right version "xiào xiàng", because, for one thing, 肖 indeed has an alternative pronunciation of xiāo [VHM: with a different meaning ("decline; decadent"; substitution for the homophonous surname 萧 / 蕭) than xiào ("resemble")]. For another, two fourth-tones in one phrase ("xiào xiàng") requires more physical effort to speak out than "xiāo xiàng".

[VHM: I completely agree with this respondent about the relative difficulty of pronouncing two fourth tones in succession — ditto for two second tones in succession or two third tones in succession. The physiological exertion required to pronounce the repeated syllables in the same tone will naturally result in tone sandhi. In this case that will be 4 4 –> 1 4. With two third tones, the best known example of tone sandhi (though there are many other varieties) is 3 3 –> 2 3.]

3.
To be honest, I only got one pronunciation right for the first section. I made the same mistakes as most of the Chinese in this video, and I thought that's how they were supposed to be pronounced. I think I hear people talking the wrong way all the time! Even in movies, TV shows, etc. (But I am not sure about official channels such as cctv news). The second part about Chinese characters, I think many Chinese don't know how to write some characters due to the lack of handwriting nowadays. People mostly use their phones, computers, iPad, and other devices to type and to chat. They don't get to write very often.

4.
I don't have any good explanations for the errors. These are extremely common errors, and I have committed most of them myself. Some characters have more than one tone /pronunciation and most people aren't aware of some of the variations. Some words (e.g., xùjiǔ 酗酒 ["alcoholism; hard drinking"]) remain in the category of literary vocab and people mispronounce them on a regular basis. Character stroke order is a matter of convention. Those obscure characters sort of fall out of the established norm.

5.
For the errors on pronunciation, those are the common ones that most people tend to pronounce incorrectly. However, the tricky part is that when most people use the wrong pronunciations, they become the acceptable ones and people cringe when they hear the right pronunciations. I think there is a distinct difference in the way people learn and view their first and second languages. People tend to be more sensitive and more determined to find the correct and standard usages or pronunciations when they learn a second / foreign language. However, with their first language they don't care as much what is the "correct" usage or pronunciation. Instead, as long as it is widely used and understood by most others, people will simply use it even if it's the wrong one.Writing the characters "biang" (a kind of Shaanxi noodles) and guī 龜 ("turtle") are just two extreme cases. It's kind of like asking a random person on the streets in Philadelphia to spell out a long GRE word. I would say outside of Shaanxi province, most people don't see the character "biang" very often, let alone write it. The character guī 龜 is a tricky one for testing stroke order, because it doesn't follow the general rules and there are simply too many curves in that character. The character nàng 齉 ("stoppage of the nose causing one to speak with a nasal twang") is also very rare. I have never seen it myself before I watched this video.

[VHM: I learned the character nàng 齉 very early, probably in the second or third year of my Chinese studies. The reason I focused on it and remembered it clearly is because I had a deep interest in all things having to do with the nose, including nasalization. I also recognized the left side of the character, the semantophore, as bí 鼻 ("nose") and the right side of the character, the phonophore, as the quaint, but devilishly hard to write nāng / náng 囊 ("the fat, flaccid flesh of a pig's teat" when used with the character chuài 膪 ["pork"], etc. in the first pronunciation, "bag", etc. in the second pronunciation).

As for the "turtle" character, it is indeed notorious for getting the number of strokes and their order right. I have seen people who are able to complete it write guī 龜 at least a dozen different ways. I really don't think we can say that there is any correct, standard way to write it. If someone is able to get it down on paper somehow, no matter how that might me, it's quite an accomplishment — regardless of the actual number and order of the strokes used in producing a particular iteration of the character. When I saw Jayme write and sanction the writing of the center horizontal stroke of the top of the two "feet" as going through the vertical "spine" and then curving down, I was aghast, because I was taught to form the two feet by themselves and have them stop where they meet the vertical spine. One thing that is essential about writing guī 龜, however, is that, no matter how you do it, you must end up with 16 total strokes! Why? Because guī 龜 itself is a radical (#213 out of 214) in the Kangxi system, which has been standard for the traditional forms of the characters for more than three centuries, and the Kangxi system stipulates that guī 龜 has sixteen strokes! Grin and bear it!]

I was a bit surprised that few people know how to write fěi 匪 ("bandit; not") in shòuyìfěiqiǎn 受益匪淺 ("benefitted not a little [i.e., a lot]"). That was a rather common chéngyǔ 成語 ("set phrase"). But chéngyǔ 成語 ("set phrases") are a more educated usage of the Chinese language anyways, and many people tend to mix characters with the same or similar pronunciations, so I think that's why some people wrote fēi / fěi 菲 ("humble, poor, unworthy; fragrant, luxuriant; the Philippines") or fēi 非 ("not") instead of fěi 匪 ("bandit; not"), which is correct.

These are just my two cents. I personally started to pay extra attention to details when I began to teach Chinese as a second / foreign language. I can understand why many of my Chinese / Taiwanese friends think it's annoying when I sometimes point out their "errors."

The answers of these highly educated respondents confirm many of the themes about language and writing in China that we have repeatedly discussed on LLog — the overriding of tone by intonation (more anon), the loss of handwriting skills to electronic devices, the substitution of homophonous characters, etc. Only scholars, teachers, and linguists are much troubled by these "errors". In the video, Jayme was playing the pedant.

BTW, I think it is so cool the way the video begins: "Hi, folks! What's up? This is Kevin here. I got my buddy Jayme with me…." It's all in good humor, and the Chinese people Kevin tests take it in the right spirit. There are lots of smiles and happiness all around. I didn't detect an iota of resentment among the people who interacted with Jayme in the video. One of the reasons for this is his ability to simultaneously adopt the personae of a pompous pedant and good ol' bro (see "Variant pronunciations of the word for 'brothers' in Mandarin" [9/25/13]). In so doing, he undercuts and deflates the pomposity of his pedantry. Whatever may have motivated Jayme to master that kind of exacting, esoteric knowledge about the Chinese language and script that he displays so abundantly, yet casually, in the video, it is no mean achievement. Indeed, it represents the ultimate form of cultural appropriation (the discussion of which is much in vogue these days) and accolade for traditional Chinese learning. Though he may make it look easy in the video, we may be certain that Jayme has been working very hard on his language and writing skills for those nine years that he has been in China.

[Thanks to Maiheng Dietrich, Melvin Lee, Jinyi Cai, Yixue Yang, and Fangyi Cheng]

May 5, 2017 @ 12:00 pm · Filed by Victor Mair under Language teaching and learning, Pronunciation, Writing systems

Permalink

42 Comments

norman said,

May 5, 2017 @ 12:22 pm

This brings to mind spelling bees back in elementary school
JCL said,

May 5, 2017 @ 1:20 pm

The little fellow at 2:50 seeing nàng for the first time is priceless.
Victor Mair said,

May 5, 2017 @ 1:36 pm

@JCL

I agree. I love that moment. Thanks for calling attention to it.
hanmeng said,

May 5, 2017 @ 3:12 pm

Don't tell me that that the way a whole bunch of native speakers pronounce a word ain't right.
JK said,

May 5, 2017 @ 4:07 pm

I've always been confused by 血, dictionaries always say xue4, but as the video shows many people pronounce it xue3. They usually list xie3 as a variation, but I don't know if I have ever seen xue3.
Bathrobe said,

May 5, 2017 @ 4:11 pm

Jayme's accent is intriguing. At times he sounds stereotypically British, at others he sounds quite American. His accent is an inconsistent mix of the two (e.g., where he says "ahtistic version"). He sounds very much like he is attempting to switch his pronunciation with less than perfect results.

One problem with Jayme's pronunciation test is how to define "correct". As I've pointed out elsewhere, the government has long been messing with the standard pronunciation of characters by fiat, thus making the whole concept somewhat artificial. The approach is distinctly prescriptivist.

Some traditional pronunciations have been declared technically "incorrect", although I think they survive on Taiwan. For example the old pronunciation of 法国 'France', fàguó, is now officially incorrect in China and is probably virtually extinct.

There are traditional Beijing pronunciations that have been declared incorrect (such as 教室 jiàoshǐ standardised to jiàoshì) but are still widely used. Personally, I cringe whenever I hear jiàoshì because it sounds oh so correct and standard but oh so artificial.

The confusion over 血 that Jayme points out seems to me to be even worse than he suggests because I believe I've also heard (although I could be wrong) xuě, which mixes the two.

I've pointed out elsewhere the widely used "spelling pronunciation" of 假期 jiàqī as jiǎqī in southern parts of China. This raises the question: how long does a spelling pronunciation remain "incorrect" even after it has become widespread, at least in some areas?

For that reason, I'm just a little sceptical of the kind of test being carried out here. If you did a similar test in the streets of an American city, how many people would pronounce a word like 'Antarctic' "correctly". How many people would pronounce 'forehead' correctly, given that the traditional pronunciation is now disappearing and the spelling pronunciation is probably even recognised and thought of as "correct"?

As for the writing test, I was astounded at how many people were able to write biáng even half right. I don't think I would get much past the ⼧ and the ⻍!

As for Jayme's recommendation to use traditional characters, I think the only result of pursuing this suggestion would be something like his accent in English — an inconsistent pastiche of both.
David Morris said,

May 5, 2017 @ 5:52 pm

Tune in for the next exciting episode, when an English-speaking Chinese person roams a major English-speaking city telling English speakers that they are speaking and writing English wrongly.

Seriously, why does stroke order matter, if the result is a static image on a page? And why does a word pronounced 'biang' (with a tone mark) have 'horse', 'heart' and 'moon' in its character?
David Morris said,

May 5, 2017 @ 5:56 pm

And are computerised characters written in stroke order?
AntC said,

May 5, 2017 @ 6:17 pm

Yes, I'm mesmerised by Jayme's accent in English. He seems to have American vowels but British consonants — esp the final stops. Is that even a thing? Maybe it's Gangsta rap?

Why does he expect Shanghai youngsters to recognise traditional characters? It would be like expecting English speakers to recognise Chaucerian spellings.

And as @David Morris asks: why does stroke order matter. Isn't writing those characters hard enough already? Can anybody tell afterwards what your writing order was?
Krogerfoot said,

May 5, 2017 @ 6:59 pm

Commenters more educated than me would be able to explain this better, but having a standard stroke order is important for deciphering handwriting. As with cursive Roman or Cyrillic writing, the pen drags along the paper in a particular recognizable path as long as the writer and the reader agree on how a particular character is written.
If you've ever seen English handwriting by someone who, following Chinese stroke order, writes the dot of an i or the cross of a t first, the resulting letter can be very difficult to understand at first, even though the aberration from the standard is actually pretty tiny.
Alyssa said,

May 5, 2017 @ 7:33 pm

Stroke order does matter in general, (though I do think it's a bit much to be nitpicky about it in this context).

Unless you're writing very slowly and deliberately, the order in which you write the strokes will affect their shape. If you use proper stroke order, these changes are predictable and everyone is accustomed to them. Whereas a character written sloppily with the wrong stroke order can be unreadable.

This is actually true for the roman alphabet as well, though to a much lesser extent. I remember studying abroad in France (from the US) and occasionally having trouble reading what the professor had written on the whiteboard because of this. For example the letter "x" isn't written with two lines crossing but instead left side then right side, so that "lx" ends up looking like "bc" to my eyes.
Bathrobe said,

May 5, 2017 @ 7:55 pm

Stroke order is not rigidly fixed. There are a number of characters where the standard stroke order (as taught to children) differs between China and Japan.
Bathrobe said,

May 5, 2017 @ 8:09 pm

Incidentally, Yellowbridge confirms Professor Mair's stroke order for 龜.
liuyao said,

May 5, 2017 @ 9:16 pm

https://en.wiktionary.org/wiki/%E9%BE%9C records three stroke orders for 龜, but only the first one has 16 strokes.

To the question if stroke order matters, the handwriting input system on tablet sometimes is sensitive to the order. An old relative of mine (who doesn't know pinyin) couldn't get 成 to show up because he wrote it in a different order from the "correct" one.

@Bathrobe, in the part about biang Jayme made it clear that people were shown the character for a few seconds before writing on their own.
Nathaniel Mishkin said,

May 5, 2017 @ 9:19 pm

Can someone give me a sense of how the differences in pronunciation being discussed here compare with the differences between, say, the way a native New Yorker and a native Alabaman pronounce English words? In the case of these English regional differences I think most people don't refer to one or the other pronunciation as being "wrong". "Funny", maybe; but not "wrong".
krogerfoot said,

May 5, 2017 @ 9:46 pm

@Nathaniel Mishkin, you might compare it to similar-but-not-the-same English words like personnel/personal or polish/Polish. The people producing the "incorrect" pronunciations are mistakenly reading the characters as other similar characters, much like me when I complain about the decline in social moors.
Thorin said,

May 5, 2017 @ 10:06 pm

Looks like Jayme Lawman is from Northampton, UK. I was intrigued enough by his English accent to do a quick Google search. Anyone know if the Northampton accent generally sounds like that? Otherwise he may spend a lot of time with Americans in Shanghai or Chinese learning American English. He's a basketball player, so he could also be around American players on Chinese teams.
Victor Mair said,

May 5, 2017 @ 10:30 pm

Although it was emphasized repeatedly in the o.p., the mispronunciations recorded in the video have nothing to do with topolectal or regional differences. For example, the misreading of 肖像 ("portrait") as xiāoxiàng rather than xiàoxiàng has to do with choosing the wrong pronunciation of xiāo for 肖 instead of xiào. As spelled out in detail in the o.p., xiāo and xiào have different meanings, and xiào gives the right meaning for 肖像 ("portrait"). But there are extenunating circumstances (e.g., tone sandhi) that help to explain why the majority of the respondents make this "mistake" (xiào –> xiāo). And so forth and so on.
Victor Mair said,

May 5, 2017 @ 10:59 pm

I have seen many people using shape-based character entry systems experience extreme frustration because they could not get the desired character to come up on their screen. The specific reasons for such failures are the following: not knowing the total number of strokes, not knowing the correct stroke order (correct sequence of strokes), not knowing the correct constituent components, not being able to determine the radical and / or the residual strokes, etc.

I am grateful to liuyao for providing a concrete example of such a failure, viz., chéng 成 ("completed, finished; fixed; become; fully grown; succeed"). This is a simple (6 strokes — count 'em), common, high frequency character, yet many people using shape-based entry systems simply cannot succeed in accessing it. Many's the time in such situations that I have seen such individuals say in desperation to a bystander, "Zěnme bàn 怎么办?" ("What to do?"), and the response would be "Gāncuì yòng pīnyīn 干脆用拼音!" ("Might as well use Pinyin [spelling]; just go ahead and use Pinyin!"). Whereupon they type "cheng" and 成 pops up instantaneously. (If they don't know Pinyin [increasingly rare with each passing year], the bystander will help them.)
AntC said,

May 6, 2017 @ 6:22 am

@Thorin Anyone know if the Northampton accent generally sounds like that?

Absolutely not! I'm gobsmacked he's British. Northampton accent is 'refined' Midlands (go from Brummie to Coventry, then a bit further East towards East Anglia, although it's also dangerously close to Home Counties.

Victor says he's been working on Mandarin for 9 years. I guess he has globe-trotting parents, so has been at International schools (maybe for longer than 9 years). He's picked up a mostly 'mid-Atlantic'/anywhere/nowhere kind of accent.
PJ said,

May 6, 2017 @ 11:37 am

@Thorin
Audio of speakers from Northampton born in the '80s here.

(I've got no preview for some reason, so I hope I've formatted that link right; additionally, the audio at the link won't play for me in Firefox but I tried it in Edge, where it's ok.)
Andrew Usher said,

May 6, 2017 @ 11:51 am

I'm amazed that you would say that! I could tell he was British from the first sentence, which showed non-rhoticity and t-glottaling (not normally found in other non-rhotic areas that way). He also has a clear southern British LOT vowel.

But, I guess, we all hear the differences from our own accent. So Americans like me hear British features, while Brits hear American features, from the same speaker; that's a well-known phenomenon with 'mid-Atlantic' accents but I don't know if his can be classified as such if he's never lived in the US or Canada.

His comments at the end about English spelling also struck me as a bit apologetic (so as not to seem to be criticising especially the Chinese language or people) – I really can't imagine average Americans having this difficulty reading or even writing words that are not totally unfamiliar. Everyone recognises that English spelling has faults, but it's still alphabetic and phonemically based.

k_over_hbarc at yahoo.com
~flow said,

May 6, 2017 @ 12:39 pm

To go and use, of all characters, 龜 to try and see how good people are at strokeorder and 'correct' character shapes is a bit, let us say, bold. 龜 is the hands-down gold medal winner in any proud-to-be-different contest; http://www.guoxuedashi.com/zidian/ytz_z23888u.html lists no less than 121 variant forms, and I can let you in on a dirty little secret: there's at least two variants missing in that table, as a quick glance over the relevant Unicode chart for U+9F9C (p524 of http://unicode.org/charts/PDF/U4E00.pdf) will readily show.

The shapes the Unicode consortium chose as representatives differ in whether the top is a single left-slanting stroke or a 'beak'/'knife', whether the 'legs' are drawn through the 'body' or not, whether the 'body' lines extend into the 'head', and whether the right 'body' line connects to the 'tail' or leaves a gap. These specific forms, it is true, where submitted by the groups from PRC, Taiwan, Japan, Korea, and Vietnam; but a quick glance over historic precedents (e.g. http://www.guoxuedashi.com/zixing/20893.html, http://shufa.guoxuedashi.com/9F9C/1/) demonstrates all these forms are well within the variation of character usage of China proper.

It's a different thing with the more obscure variants like , but certainly using any of the variants listed in the Unicode chart (or any one that uses a different combination of the distinctive features listed above) cannot be called wrong. In violation of government standards, perhaps, but not wrong.
~flow said,

May 6, 2017 @ 12:41 pm

Gotcha. Again: … more obscure variants like 𠁴𠃾𤕣𧑴黾龟亀𪚦𪚨𪚧, but certainly …
Ken said,

May 6, 2017 @ 12:57 pm

I wonder what the reaction would be if this guy visited an African-American neighborhood in the US and told people that they were pronouncing words wrong. "No you silly goose, it's 'ask' not 'axe'!"

Re: 血, I've grown up pronouncing this word as xue3, though maybe it's supposed to be xie3, and my whole family was overcorrecting our Taiwanese accents. I've never heard xue4.
Alexander said,

May 6, 2017 @ 1:03 pm

> Why does he expect Shanghai youngsters to recognise traditional characters? It would be like expecting English speakers to recognise Chaucerian spellings.

In my American high school (c. 2001), nearly everyone read out "draught" to rhyme with ought rather than draft.
Moa said,

May 6, 2017 @ 1:08 pm

I learned Mandarin as an adult, and I only learned the xie3 prononciation for blood (or maybe xue3?). I don't think I ever noticed the xun4 pronunciation until my second or third year of studies! Probably from some CCTV programme, haha. And only today I learned it's xun4 with the fourth tone.

The fluent mix of English and Mandarin from Jayme is a delight to hear!
Alex said,

May 6, 2017 @ 1:13 pm

"I really can’t imagine average Americans having this difficulty reading or even writing words that are not totally unfamiliar. Everyone recognises that English spelling has faults, but it’s still alphabetic and phonemically based."

I found this to be especially true for the reading of proper nouns.
I was reading a book on Alexander by Peter Green. Quite often there would be a string of people and places.

"One of his staff officers, Amphoterus, accompanied Sisines back to Parmenio, with instructions that the Lycestian was to be held under close arrest pending further investigations."

That was an example of a shorter set proper nouns!

I think even if the reader is pronouncing it incorrectly in their minds, when reading it, it is relatively fluid reading.

The reader can also fluidly read a sentence of Chinese names and places Latinized for example the biography of Deng Xiaoping by Vogel.

"Once settled in their home in Jianxi, each day Deng and Zhou Lin rose at 630 am"

Some local colleagues who have read some Chinese translations of some English historical books have said its awkward at times when there is a string of people and places.

I suppose because the mind reads the characters individually and then has to combine it and the fact that there is no capitalization and the spacing is all equivalent.

Maybe for English the mind just sees proper noun / name and goes into glance mode when encountering unfamiliar proper nouns.

I was wondering if any locals can provide their perspective on this. Do locals go into glance mode when reading Chinese translated unfamiliar proper nouns so it isn't awkward or is it awkward.
Moa said,

May 6, 2017 @ 1:19 pm

Sorry, I mixed up the pronunciation for xue4 in my comment. I meant I learned the xie3 pronunciation, and later noticed xue3 and only today know it's supposed to be xue4.
flow said,

May 6, 2017 @ 2:58 pm

One more remark on 龜. "One thing that is essential about writing guī 龜, however, is that, no matter how you do it, you must end up with 16 total strokes! Why? Because guī 龜 itself is a radical (#213 out of 214) in the Kangxi system"

I understand that as a tongue-in-cheek remark. People who point out the Kangxi dictionary as a standard for *writing* (as opposed to printing) characters should be aware of the fact that only the preface of that work can be used for the purpose; the main part is strictly about the printed forms of characters.

Had the main part been meant to apply to hand-writing as well, we would have to concede that a great number of character forms that appear in the preface are in glaring violation of the forms stipulated in the main part (which is an interesting hypothesis in its own right, but nonetheless not the default option).

As concerns stroke counts, I seem to remember that there are definitely inconsistencies in the Kangxi; also, there are some characters for which the stroke count (while being consistent) is calligraphically (is that a word?) dubious (e.g. 了子 are counted as 2 and 3 strokes where I think it should really be 1 and 2 strokes). I'd have to scan a few books before I can give more details, tho.
Bathrobe said,

May 6, 2017 @ 7:19 pm

I love the great range of shapes for 龜 that ~flow dug up. When people speak of the "correct" or "standard" form, they are doing so under the influence of modern educational systems, which need to settle on a single form for teaching to children. Having learnt the standard form, learners then need to be 're-educated' as adults to realise that what they learnt as 'correct' is an arbitrary choice made by educators. It's a bit like prescriptivist grammar in English, which tries to standardise correct usage and, as a side effect, inculcates insecurities, intolerance, and peeves.

I copied all the variants from the Unicode chart for U+9F9C to a Word document in order to enlarge them and they all came out the same. Does this mean that even acceptable variation is likely to disappear as everyone sticks to standardised fonts? Or can we expect greater variety as new fonts are developed?
J K said,

May 6, 2017 @ 8:12 pm

It seems that many people agree that xue3 is a common pronunciation for 血, and that is what I was taught to say while in Beijing, but I have not yet seen any dictionaries that list xue3 as an acceptable pronunciation. Is there some explanation for this? It seems that this is not some sort of fringe pronunciation.
Bathrobe said,

May 6, 2017 @ 10:48 pm

The adoption of dual readings for 血, xiě and xuè, probably reflects the linguistic situation at the time readings were fixed for Modern Standard Mandarin. The reading xiě would have been used in the spoken language (presumably the demotic language of Beijing) when talking about 'blood', while xuè was used in 'literate' Chinese in more elevated contexts. Presumably the demotic pronunciation was considered ok for the single word 血 but not for other contexts, where it would sound comical or vulgar. Similarly, the literate reading was probably considered acceptable for formal or technical contexts but impossibly ridiculous or affected when referring to blood in everyday speech.

As to why xuě is so common in actual use, I can only suggest a couple of hypotheses. One is that xuě is a result of contamination from the literate reading. Another is that both xuě and xiě were current prior to standardisation and only xiě was chosen as the standard form. Xiě then eventually lost out to xuě in the ordinary speech of Beijing. Someone who knows the history of these terms would be able to give a more definitive answer.

As for why these phenomena exist, I draw your attention again to China's long-standing efforts at language planning, whereby conscious efforts have been made to rationalise character readings. Language-planning decisions are enforced in the education system and enshrined in dictionaries. This is much more far-reaching than most people are aware of. For instance, the traditional pronunciation of the 介词 ('preposition') 往 as wàng was abolished in favour of wǎng, which has now become the only accepted pronunciation. I believe the rationale for this was to reduce multiple readings. For similar reasons, the old reading shèng for 乘 was abolished in favour of the single reading chéng. I am quite impressed at how successful these language planning efforts have been, although some non-standard pronunciations do persist, such as the Beijing pronunciation of shǐ for 室.

Because these old readings disappear without a trace — dictionaries simply stop listing them — most people aren't even aware of what has been going on. I became aware of it when I found an insert in a Chinese-Japanese dictionary that I bought some decades ago listing all the changes.

When looking at Chinese dictionaries it's important to keep in mind that they are normative dictionaries and do not necessarily show pronunciations that are actually current (although, as I said, language planning efforts appear to have been remarkably successful).
flow said,

May 7, 2017 @ 4:16 am

@Bathrobe—"I copied all the variants from the Unicode chart for U+9F9C to a Word document in order to enlarge them and they all came out the same."

There is no way for this to work out. The reason is simple; in the PDFs, each glyph is encoded as a combination of a font (that's where the shapes are) plus a codepoint (a non-negative integer number). The variants were submitted by the representatives of each region in the form of a font that encodes the relevant portion of the Unicode code space; this works dandy since each region gets to submit at most one glyph shape per codepoint.

In order to reconstruct the PDF's appearance in detail, you would have to format each column with the appropriate font. Sadly, those are not distributed by the Unicode consortium (which, to make matters worse, puts up all kinds of copyright-related barbed wire / red tape / nonsense around their camp to make sure everyone has a hard time in reproducing on of the most eminent standards of the digital age at home).

Let me warn you that even with the right fonts in your hand, things are not going to be a pleasure ride when you want to be picky with glyph shapes. MS Word in particular, but also Libre Office and almost every other modern software uses a so-called font substitution mechanism; the theory is that when a user chooses font F to write a text that includes character C, and that character C happens not to be available in F, the least bad thing to do is to choose another font F2 that looks almost like F but has C encoded. A lot of brain time went into making this work; one of the moving parts of the mechanism is the so-called font panose, which is basically a feature-oriented fingerprint of what a given font looks like, what it tastes like, so to speak. So that means missing shapes from a given font F may be substituted with other shapes from F2 if the distance between the panose of F is sufficiently close to that of F2. In theory.

To give credit where credit is due, automatic font substitutions somehow 'works', provided you are working with Latin script and your font is either Times New Roman or Arial / Helvetica. Except for the Euro sign which always looks wrong if its not in the font already.

The mechanism completely breaks down when you feed it with CJK glyphs. Personally, I use two solutions I authored myself, one is HTML/CSS based and uses CSS Unicode Ranges; the other is a Markdown-to-PDF typesetter (https://github.com/loveencounterflow/mingkwai-typesetter; caveat: probably won't run out of the box) that uses XeLaTeX for the hard parts and a hell of code upfront of that to ensure the right font is chosen for each single glyph. Either that, or I could also use Adobe InDesign (which explicitly does not use font substitution), or else I see a jumble of font faces for my CJK texts, no matter the OS, the software, or time of day.

Word and Open / Libre Office in particular have to be called out for having made a mess of font substitution. Proof: copy-paste a piece of CJK text into a document that uses codepoints from various Unicode blocks (CJK Unified Ideographs, CJK Extensions B, C, …); then, hilite the text and choose different fonts. Depending on font choice, you will see some characters change to the selected font, some characters change to substitution fonts, some characters not changing at all. Do that a few times and come back to your first font choice. our text will now likely look subtly (or not so subtly) different from the time you first pasted it, because it now depends on the history of your choices. I know of no web browser or office application that tells me anywhere what the resulting displayed font's name is, so you can only visually ascertain correctness.
Andrew Usher said,

May 7, 2017 @ 10:32 am

Alex:

I don't quite understand the point you're making in your reply. In English works, when one comes across unfamiliar proper nouns, one can say them in one's head in a way based on spelling (if any way is needed). In the case of the ancient Greek names, I'd use the Anglo-Latin system; for the Chinese names, I'd use an ad hoc spelling-based system that I know doesn't reflect real Mandarin (e.g. Pinyin 'zh'). But that's only for one's own benefit.

If words are to be used as ordinary language they must have recognized pronunciations that will not be confusing. That is where I think the 'errors' demonstrated in this video are supposed to come in. In English, the spelling-based guess will probably be understood by other English speakers; in Chinese – there is no 'spelling' as such. In fact one of the biggest mysteries for a Westerner is how to ask in Chinese 'how do you write [unfamiliar spoken word]' _because_ you can't orally spell out something as you can in all alphabetic languages.
Rumiko Sode said,

May 7, 2017 @ 5:16 pm

Good old prescriptivism in the mind of a youngster, deciding that a diverse speech community with various combinations of multi-dialectal competence speak "incorrectly". Has "Jayme" had any linguistic training?
Bathrobe said,

May 7, 2017 @ 5:38 pm

@AntC

Yes, I've come across these problems trying to use CJK on web pages. (I have no idea how to deal with them in Word.) I think I've discussed this both here and at LanguageHat. I didn't realise that the problems were the result of an institutional failure on the part of both the consortium and software developers.

I find that a web page generally works ok as long as you are only using one language, which you can declare in the html tag. But try mixing languages and the result is quite unlovely. I'm not tech-savvy enough to figure out a solution. Things that look like they should work, like specifying the language as "zh-Hant" or "zh-Hans", don't seem to do the trick. I assume that this is the issue that you solve with HTML/CSS and Unicode Ranges.
Alex said,

May 7, 2017 @ 6:45 pm

I guess my point is that English is a language that readily absorbs words from other languages/cultures and new words. Chinese does not seem to absorb words as easily.
Idran said,

May 8, 2017 @ 3:04 pm

@Rumiko Sode: Victor already showed repeatedly that these aren't dialectical differences, they're common production errors. Descriptivism doesn't mean that production errors don't exist.
Alyssa said,

May 8, 2017 @ 9:13 pm

I don't think it would be too difficult to find a list of English words that would give people difficulty in spelling or pronunciation. After all, as xkcd proved, *nobody* can spell "fuchsia" : https://blog.xkcd.com/2010/05/03/color-survey-results/

(And I know I personally mispronounce "often", "sherbet", and "espresso" just for a few examples. Oh and I type "excersize" about half the time…)
Alex said,

May 8, 2017 @ 10:19 pm

Its about efficiency.

If on average kids in the US studied English using the same amount of time the local kids study learning Chinese the US wouldn't have any inner city illiteracy issues.

The only reason the US kids do not do well is because they don't spend as much time on education.

Conversely if the local kids spent the time they spent learning Chinese and English just on English here they would all be years ahead of their counterpart grades in the US. This I have no doubt
Victor Mair said,

May 9, 2017 @ 3:34 pm

From Jing Wen:

This is a very interesting video. I am not surprised that people can not read 齉. Words like this (such as 齈, 齁, 搋, etc.) seldom appear in publications or mass media. People usually don't use them in daily life either.

For other simple words like 血, many people pronounce it xue3, though we were taught in primary school that it is pronounced xue4 or xie3. For other words like 酗, people may mistakenly pronounce it because they take 凶 as the phonogram. The same happens to 犷. Although 广 is pronounced guang3, other words that contain 广 are pronounced kuang4, such as 矿 and 旷.

These words appear and appear again in primary school textbooks and exams. In fact, I think kids are more likely to pronounce them correctly. Adults, however, seem to have forgotten what their Chinese teachers had taught them when they were nine years old.

RSS feed for comments on this post

White dude challenges Chinese speakers in Shanghai

42 Comments

norman said,

JCL said,

Victor Mair said,

hanmeng said,

JK said,

Bathrobe said,

David Morris said,

David Morris said,

AntC said,

Krogerfoot said,

Alyssa said,

Bathrobe said,

Bathrobe said,

liuyao said,

Nathaniel Mishkin said,

krogerfoot said,

Thorin said,

Victor Mair said,

Victor Mair said,

AntC said,

PJ said,

Andrew Usher said,

~flow said,

~flow said,

Ken said,

Alexander said,

Moa said,

Alex said,

Moa said,

flow said,

Bathrobe said,

J K said,

Bathrobe said,

flow said,

Andrew Usher said,

Rumiko Sode said,

Bathrobe said,

Alex said,

Idran said,

Alyssa said,

Alex said,

Victor Mair said,

Follow us on Twitter

Archives [+/–]

Blogroll [+/–]

Meta