Words in Mandarin: twin kle twin kle lit tle star

« previous post | next post »

Randy Alexander sent me the following photograph and asked how long it would take for me to identify the text in the background:

My instantaneous reply:

about 2 seconds

Twinkle Twinkle Little Star

I love the pinyin with tones, but I wish that the little girl’s teacher would help her link up the syllables into words, e.g., xǔduō 许多.

Here’s one Mandarin translation of “Twinkle Twinkle Little Star” (not exactly the same one as that on the greenboard in the photograph, but I like this version because it gives pinyin with tones and orthographically correct linkages of syllables into words, plus some lexicographical notes and example sentences, as well as recordings for each line).  There are plenty of other sites that provide the Mandarin lyrics for “Twinkle Twinkle Little Star”, but they mostly have pinyin without tones and single syllable by single syllable transcription of characters without proper aggregation into words.

We have a Startalk Mandarin program at Penn this summer, and the teachers have posted these signs on the toilet doors:

nán cè suǒ
男 厕 所

nǚ cè suǒ
女 厕 所

(The first means “men’s toilet” and the second means “women’s toilet”.)

I’ve gone around with a magic marker and connected the cè suǒ –> cèsuǒ. Sooooooo aggravating!! Writing “cè suǒ” instead of “cèsuǒ” is like writing “toi let” instead of “toilet” or “Jane ‘s jack et” instead of “Jane’s jacket”. So far as I know, all the languages of the world have words as well as syllables and morphemes, so there’s no reason why Chinese languages should be treated any differently because the writing system traditionally used to record them is largely, but not exclusively, monosyllabic.

Even Classical Chinese (or Literary Sinitic) had polysyllabic words, copious examples of which are listed in one of my favorite old dictionaries: Cí tōng 辭通 (Comprehensive Phrases; 1934).

Update —


  1. S. Tsow said,

    August 14, 2012 @ 11:36 pm

    In Thailand, we have a hybrid form of English and Thai called Tinglish. You have to know Thai pronunciation patterns to appreciate this, but a Tinglish version of “Twinkle, Twinkle Little Star” would go like this:

    Ta-winken, Ta-winken, Ritten Sa-tar!
    How I wonda what you are!
    Up abou da wornd so high,
    Rike a tea-tlay in da sa-ky!


    I’m not sure that last line is accurate, having never heard the song for the past 60 years or so.

  2. Nik said,

    August 15, 2012 @ 12:39 am

    A little concat goes a long way, overzealous can produce words like Donaudampfschiffahrtsgesellschaftskapitän or Elektronenspeicherringgesellschaft as one calls it in Germany, or simply synchrotron at Berkeley

  3. Brendan said,

    August 15, 2012 @ 1:30 am

    I wonder if there’s some cosmic rule dictating that every single Romanization system for Chinese will be either be rendered unusable by human error — spacing issues seem to be to Hanyu Pinyin what improper apostrophe use was to Wade-Giles — or bulletproof in design but not learnable by mere mortals (Gwoyeu Romatzyh).

    Presumably a lot of this is caused by the general confusion between 字 and 词 in Chinese, and the unbelievably powerful ability of Chinese characters to wag the dog when it comes to making people think of the spoken language as being made of characters. Most of my Chinese friends have simply not been aware that there are rules you know (as I have occasionally snapped) when it comes to Pinyin spacing, apostrophes, capitalization, etc. — though then again, I’ve seen plenty of non-native speakers of Chinese doing the “in cor rect spa cing” thing, or the “ObNoXiousCaPiTaLiZaTion” thing despite coming from first languages that indicate word boundaries with spaces, so who knows.

  4. Don Sample said,

    August 15, 2012 @ 1:43 am

    “Rike a tea-tlay in da sa-ky!” owes more to the Lewis Carroll version, “Twinkle Twinkle, Little Bat,” than the original.

  5. maidhc said,

    August 15, 2012 @ 3:40 am

    Wikipedia says the earliest known version was a French song, Ah! vous dirai-je, Maman.

    Twinkle, twinkle, little star has five verses. How many people know the other four?

    More at

  6. Anon said,

    August 15, 2012 @ 5:01 am

    I have never understood the real reason for the ”envy” and admiration expressed by the layman towards Chinese speaking western people. There is nothing hard or admirable about learning Chinese, in fact, more than 10^9 people can do it, and reasonably talented kids learn how to do it properly by the time they are 10 and sometimes much less than that.

    Essentially anyone with normal IQ can learn Chinese from scratch within a few years, given that he/she has enough time on his/her hands.

    On the other hand, there are intellectual challenges that only a handful in our society can properly address, and these are the real things that should be regarded with deep respect.

    These are things that no matter how many years an average person puts into, this person will never be able to surmount the challenge. As an example, one can take various subjects in mathematics or physics.

  7. Paolo said,

    August 15, 2012 @ 5:10 am

    Apologies as this is OT, but I cannot help pointing out that in Italian cesso means “toilet”.

  8. Jonathan said,

    August 15, 2012 @ 5:43 am

    Excuse the frivolous comment, but I am reminded of Kurt Vonnegut’s version in San Lorenzo creole:
    “Tsvent-kiul, tsvent-kiul, lett-pool store
    Ko jy tsvantoor bat voo yore
    Put-shinik on lo shee zo brath
    Kam oon teetron on lo nath
    Tsvent-kiul, tsvent-kiul, lett-pool store
    Ko jy tsvantoor bat voo yore”.

  9. richard howland-bolton said,

    August 15, 2012 @ 6:11 am

    And when it comes to Mother Goose there is the incomparable Mots d’Heures: Gousses, Rames.

  10. Brett said,

    August 15, 2012 @ 7:49 am

    @Nik: I work with German physicists, and none of them use any word but “synchrotron” for a relativistic ring accelerator. The name of the DESY facility is short for “Deutches Elektronen-Synchrotron.” If one claims that the word in German is excessively long, one must also admit that the name of such a device in English is “synchrocyclotron.”

  11. Lydia said,

    August 15, 2012 @ 7:50 am

    I disagree with what you’re suggesting (and ‘sooooo aggravatingly’ correcting) – that is, to link two words together. As a native speaker of Mandarin Chinese, I find that you cannot truly treat words like 厕 所 as a polysyllabic word like English, as they tend to contain equal stress. In fact, if I saw a signed which showed cèsuǒ’ instead of ‘cè suǒ’, I’d have a good mind to split them up! x

  12. APOLLO WU said,

    August 15, 2012 @ 7:51 am

    Chinese people link all characters in a sentence but consider characters in a word as separates. It is most unfortunate that they are not aware of the problem. Eight characters poems was rare since ancient time, as ancient Chinese felt the problem but didn’t know why. Today we know that human brain can efficiently handle 7 units at a time, similar to a 7 lanes highway (just try to memorize an eight digit phone number). The unit on each channel can be a binary number, a numeral, a alphabetic letter, a syllable, a work or an acronym such as DNA or AIDS with increasing information content. The optimum use of the information channels is to put larger and larger unit on it (like double deck bus on the highway). I believe English did not have acronyms in Shakespeare’s time, but now they are everywhere. Hopefully Chinese will pickup the idea of optimal channel usage and connect characters together into larger word chunks. The convention of not separating characters into meaningful chunks is an ineffective extreme. Words containing several characters should be joined together to form meaningful units as in most languages of the world. 烘 手 机 can be misunderstood as 烘 手机 (heatdry the cellphone) or 烘手机 (hand dryer). In machine translation, automatic Chinese word-separation is an added step whereas other languages have words already separated. I hope to introduce word-separation in Chinese writing itself, e.g. 让 我们 大家 一起来 唱歌 跳舞。 which maps to Pinyin as: rang women dajia yiqilai changge tiaowu. By the way, better drop the initial capitalization for Pinyin sentence, which lead to ambiguity as; Beijing shi buzhidao de. The first word may actually be 背景。 As Pinyin has much fewer syllables, you can imagine the problem crop up much more frequently as:Shanghai 伤害, Chengdu 程度 etc. In English machine translation, all initial capitalization is converted to lower case before dictionary look up. – Apollo

  13. Boris J. said,

    August 15, 2012 @ 9:46 am

    The full text of the (widely popular among French speakers) original version :

    Ah ! vous dirai-je, maman,
    Ce qui cause mon tourment.
    Papa veut que je raisonne,
    Comme une grande personne.
    Moi, je dis que les bonbons
    Valent mieux que la raison.

  14. Matt McIrvin said,

    August 15, 2012 @ 10:19 am

    Mozart is occasionally credited with writing the tune, but he didn’t; he just made use of it.

    These are things that no matter how many years an average person puts into, this person will never be able to surmount the challenge. As an example, one can take various subjects in mathematics or physics.

    I don’t think these subjects are really any more inherently impenetrable to the average person than picking up new languages. In either case, it’s a question of having the will and ability to devote oneself to full-time study, which not everyone has. Children who learn their native language are effectively devoting a large portion of their waking hours to the task, because they have to.

    Now, one thing that really is different about advanced topics in physics and mathematics is that they have all sorts of prerequisites which are themselves topics requiring long study. So you can’t even begin until you’ve already done all this other stuff. But there’s no need to be a particularly special person just to start the journey.

  15. Andy Averill said,

    August 15, 2012 @ 10:34 am

    Reminds me of the Texas lady who went into a toilet in Germany marked Herren. She said, I thought they must be His’n and Her’n.

  16. Vasha said,

    August 15, 2012 @ 11:01 am

    Ah ! vous dirai-je, maman…

    Wikipedia says this was an 18th-century parody of a poem, which explains why the variety of French is oddly literary for a small child. Maybe this adds additional humor to today’s listeners, to have a child expound the childishness of their reasoning while talking like a book?

  17. richard said,

    August 15, 2012 @ 11:30 am

    @Brendan’s comment on the romanization of Chinese is just as true–if not ever so much more so–for the romanization of Korean, which has the added joy of being rejiggered every few years by (mostly) well-meaning people on both sides of the DMZ. For a sense of a relatively recent (2007 or so) romanization kerfuffle, just google the phrase “dog rib moon” (in quotation marks). Part of the issue–OK, most of the issue, in my opinion–is lack of agreement on what the function of romanization should be, and who the audience should be. Represent the han’gul spelling? Represent the sounds (and if so, in which dialect)? Scrub “foreign practices” from the language? Help out native speakers? Help out Korean-speaking foreigners? Help out non-Korean-speaking foreigners? Strike a blow against neo-colonization by global English? All of the above?

  18. michael farris said,

    August 15, 2012 @ 11:48 am

    Korean’s an interesting case because as far as I can tell, it’s not hard to come up with a romanization that either represents Han’gul spelling consistently or representts the surface phonemics of the standard language – but it’s impossible to do both and simultaneously come up with a system that is in harmony with English spelling practices either to familiarize Korean speakers with the English writing system or to help out monoglot English speakers who can’t be bothered to learn Han’gul.

    My own prefence would be more transliteration based McCune-Reischauer (I just wish they’d thought of frequency and had reversed the values of u and ŭ (and maybe those o and ŏ).

  19. Victor Mair said,

    August 15, 2012 @ 2:50 pm

    from Michael Carr:


    The OED (s.v. toilet) gives

    1884 W. S. Gilbert Princess Ida ii. 69 He grew moustachios, and he took his tub, And he paid a gui-nea to a *toi-let club—And he paid a gui-nea to a toi-let club.


    VHM: At least they put in a hyphen.

  20. Victor Mair said,

    August 15, 2012 @ 3:23 pm

    Until about 10-15 years ago, Chinese instructional material used to provide properly segmented pinyin. But the nation took a big leap backward when hanziphiles in the educational bureaucracy became alarmed that word-based pinyin was becoming a de facto alternative to Chinese characters as a script for writing Mandarin and demanded that all pinyin syllables be written separately.

  21. Army1987 said,

    August 15, 2012 @ 3:25 pm

    IIRC, Vietnamese spelling puts spaces at the end of each syllable.

  22. maidhc said,

    August 15, 2012 @ 4:04 pm

    Selon http://fr.wikipedia.org/wiki/Ah_!_vous_dirai-je,_maman

    Les paroles de la chanson enfantine sont une parodie d’un poème d’amour anonyme, La Confidence:

    Ah ! vous dirai-je, maman,
    Ce qui cause mon tourment ?
    Depuis que j’ai vu Clitandre,
    Me regarder d’un air tendre ;
    Mon cœur dit à chaque instant :
    « Peut-on vivre sans amant ? »

  23. michael farris said,

    August 15, 2012 @ 4:31 pm

    Victor Mair: “the educational bureaucracy became alarmed that word-based pinyin was becoming a de facto alternative to Chinese characters”

    I hope you don’t think you can dangle this little tidbit in front of us without going into more details (or at least as many as you dare…). I don’t doubt you for a second but I’m really curious about the details.

    Army 1987: Yes the Vietnamese practice is to write each syllable separately, as can be seen here (link to newspaper)


    There are a number of reasons for this, including centuries of conditioning with Chinese characters to regard the syllable as the unit of writing and the fact that AFAICT no one has a good handle on how to divide words in Vietnamese, there were times when I could easily understand a written sentence yet have no idea on how it would best be divided into words. Apparently clearer criteria for word divison can/have been established for Mandarin.

    It doesn’t seem to cause any special difficulties for native speakers but it does make learning to read harder than it maybe needs to be.

  24. Sabio Lantz said,

    August 15, 2012 @ 5:13 pm

    Ya know what I like to do with Chinese and Japanese transliterations to show characters, is write:
    nán cè suǒ –> Nán CèSuǒ (capitals show characters)
    I don’t know if this is anyone else’s convention but it is one I often use.

  25. Victor Mair said,

    August 15, 2012 @ 5:27 pm

    from Matt Anderson:


    This reminds me of a menu I saw just today for a new Xi’an style restaurant in Flushing, Queens (even the name of the restaurant – Biáng, a word I think you’ve written about on LL – carries a tone mark). The menu is I think the first I’ve ever seen which includes pinyin with tone marks for every item (even neutral tones are done correctly), but it still writes the syl la bles of each word se pa rate ly:


  26. Victor Mair said,

    August 15, 2012 @ 6:14 pm

    @Michael Farris

    While I am not at liberty to divulge all that I know about this subject, let me just say that it happened in the context of the changeover from the Wénzì gǎigé wěiyuánhuì 文字改革委员会 (Script Reform Committee) as an independent and powerful bureau under the Guówùyuàn 国务院 (State Council) to the Yǔyán wénzì gōngzuò wěiyuánhuì 国家语言文字工作委员会 (State Language Commission) under the Ministry of Education.


    The change of name is indicative: anything with the morpheme gé 革 in it became suspect, because it smacked of gémìng 革命 (“revolution”). By the mid-80s, when this happened, the Chinese Communist Party, which had once been an exponent of revolution, had begun to fear revolution (both “revolution” and “jasmine” have recently been censored on the Chinese internet), and even gǎigé 改革 (“reform”) was studiously avoided.

    In any event, after the transformation of the Script Reform Committee to the State Language Commission, the new leaders of the latter body saw fit to insist on the splitting of all pinyin words into single syllables. All of the old reformers were pushed aside, and one of the most distinguished among them quite literally qìsǐle 气死了 (“died of anger”). I remember visiting him several times in this post-changeover period and seeing him fuming over this very matter of the splitting up of words into syllables in pinyin transcriptions.

    Peter Hessler has written about some of the main personalities involved in Oracle Bones: A Journey Between China’s Past and Present. The only one who survives is the centenarian, Zhou Youguang, for which see the last paragraph here


    and Sino-Platonic Papers #226


  27. Thom said,

    August 15, 2012 @ 10:04 pm

    What I have not seen anyone really comment upon is the irrelevance of the issue based on the fact that pinyinization is merely a technique used to aid in teaching pronunciation. The characters are the actual language; pinyin is only a supportive tool–essentially, it is the Chinese version of IPA pronunciations that one might find in any dictionary (just as it is used in Chinese dictionaries). Along the same lines as what Apollo was saying saying: a word in pinyin can be highly misleading. Looking up shì [shi4]on the MDBG Chinese-English dictionary displays nearly 50 individual characters ( http://www.mdbg.net/chindict/chindict.php?page=worddict&wdrst=0&wdqb=shi4 ). Therefore, the pinyin–other than providing pronunciation–tells the reader nothing. Provided greater context, one could be able to differentiate. The point being is that pinyin is not really Chinese any more then /mit/ = meat/meet. Would you argue that /ˈkʌbərd/ is right and /ˈkʌ bərd/ is incorrect?

  28. Geoff Wade said,

    August 15, 2012 @ 10:08 pm

    “Sooooooo aggravating!! Writing “cè suǒ” instead of “cèsuǒ” is like writing “toi let” instead of “toilet” or “Jane ‘s jack et” instead of “Jane’s jacket”. So far as I know, all the languages of the world have words as well as syllables and morphemes, so there’s no reason why Chinese languages should be treated any differently because the writing system traditionally used to record them is largely, but not exclusively, monosyllabic.” –Victor

    I know that this will rattle the shutters again. Surely, Victor, given that each of the components of a Chinese 辭 is in itself a 字 and, in most cases, has a meaning which contributes to the meaning of the resultant 辭, there is some basis for representing them separately. You speak of rendering “cèsuǒ” as “cè suǒ” as being equivalent to writing “Jane ‘s jack et”. But this is not a valid correlate. Rather, something like “Jane ‘s out house” rather than “Jane’s outhouse” is a better comparison. The separation of the component terms does no violence to the meaning and in fact reveals the components which constitute the final word.
    Representing the 字 (they are not just syllables) separately, but with some method of indicating their connectedness (above Sabio Lantz prefers an initial capital for each 字, while I prefer a hyphen between them, giving “cè-suǒ”) shows that the respective 字 are part of a 辭. That adds to clarity and assists in pronounciation.

  29. Mark Mandel said,

    August 15, 2012 @ 10:56 pm

    “I’ve gone around with a magic marker and connected the cè suǒ –> cèsuǒ.”

    Ah, a pedant after my own heart. I’m not being sarcastic: I carry a marker for much the same purpose, e.g., crossing out the apostrophe in “TROLLEY’S WILL STOP HERE”.

  30. Victor Mair said,

    August 15, 2012 @ 10:57 pm


    The characters are NOT “the actual language”. The language consists of phonemes and morphemes, which are arguably better represented in pinyin than in characters.

    By linking up syllables into words, through capitalization and italicization, etc., pinyin actually tells us a lot that the characters fail to convey.

    You completely misunderstand the points that Apollo, who is an ardent supporter of pinyin, is making.

    The ambiguity of the 50 shi4 to which you allude disappears when these morphosyllables are joined into words, as they are in natural language. If Chinese languages were so highly ambiguous as you seem to allege, no one would be able to hold an intelligible, intelligent conversation. As I’ve said on countless occasions, no one ever spoke a Chinese character (except in a cartoon balloon). What people speak are sounds, and pinyin is more effective in conveying those sounds than characters are.

  31. Jay Sekora said,

    August 15, 2012 @ 11:22 pm

    I don’t see why spacing between syllables but not between words in Pinyin is really something to get agitated about. There are (generally very fuzzy) phoneme boundaries, syllable boundaries, morpheme boundaries, phrase boundaries, sentence boundaries, and so on, and some writing systems indicate some of those, some indicate others, and some (like Classical Latin, for instance) don’t indicate any. I mean, yes, slightly more information is arguably¹ communicated by “Tã shuõbushuõ Zhõngwén ma?” than by “tã shuõ bu shuõ zhõng wén?”, (and more information is definitely communicated by the latter than by “ta shuo bu shuo zhong wen ma” or by “ta shuo pu shuo chung wên ma”, ignoring aspiration as well as tone).

    But I don’t think it was wrong for Caesar to write “GALLIAESTOMNISDIVISAINPARTESTRES” (or he might have written “CALLIAEST…”; I’m not sure); he was just using a slightly different writing system than mediaeval or modern writers of Latin use.

    People who write Pinyin this way (just like people who write in Chinese characters) are using a writing system which leaves out information about certain kinds of boundaries. They’re also leaving out information about intonation, etymology, and all sorts of other things about the utterance that are interesting (and many of which some writing systems do indicate in some way). Similarly, a standard English writer who writes “I don’t know who to refer that guy I talked to you about to” is marking word boundaries, but leaving out information about phrase boundaries (some but not all of which is reflected in pronunciation). And the fact that somebody writing “homeland security” indicates a kind of boundary that somebody writing “Heimatschutz” doesn’t indicate does not mean that one or the other of them is doing something wrong, it just means that the two writing systems (and in this case to some extent the two languages, although it’s easy to imagine English or German being written with the other’s conventions) are somewhat different.

    (In this case, actually, it looks to me like the Pinyin is being used basically like furigana, to annotate the pronunciation of each character individually, rather than being intended as a parallel independent transcription in a different writing system. But I don’t know enough about Chinese pedagogy to know if that’s likely to be the case.²)
    ¹ It’s not clear to me, for instance, that ”shuõbushuõ” actually communicates more information than “shuõ bu shuõ”. It certainly doesn’t to somebody who’s not already at least a little bit familiar with Chinese; you wouldn’t be able to look it up as a word in a dictionary.

    ² Or, really, anything at all about Chinese pedagogy.

  32. Jay Sekora said,

    August 15, 2012 @ 11:27 pm

    Er, remove the “ma” from the examples. I started with the various transcriptions of “Tã shuõ Zhõngwén ma?” and then realized I could make things even more confusing (and buy myself an extra footnote) using “shuõbushuõ”, but neglected to edit them all consistently. And my decades-old Chinese is terrible in any case.

  33. Thom said,

    August 16, 2012 @ 12:06 am


    I understand what you mean about “the actual language”–in which case, I should have said “the technically true, current written form of the language”.

    However convenient pinyin seems, we cannot forget that even those symbols are rather arbitrary. Furthermore, the characters provide a subtlety of meaning in their own right (i.e. the radicals indicating animal or mineral).

    Also, I am curious, is it of your opinion that 迈克尔杰克逊 is incorrectly spaced (in the fact that it is not spaced) in comparison to 迈克尔-杰克逊? (Google uses no spaces, while various websites are mixed about the separation. The first 5 pages of results are approximately 60% hyphenated/40% non-hyphenated.) Additionally, I should ask, in your opinion, is pinyin a native-user or non-native-user controlled invention? Are these rules codified in a manual?

    It’s not that I disagree with you or wish to be argumentative. Honestly, I feel that the argument is fallacious in that it ignores the fact that a written language with a long history already existed for this mode of communication–that does not follow the rules that you claim should exist in the convenient Westernized form. IME’s do not need spaces to provide correct characterizations of the pinyin that is typed into computers after all.

  34. Kevin said,

    August 16, 2012 @ 12:20 am

    May i ask, how many people here are native speakers of Mandarin? Only Lydia has indicated that she is.

    Where i come from native bilingualism in English and Mandarin is widespread. I honestly cannot recall ever having seen anybody here write pinyin sans spacing between syllables (unless it’s used in a manner that suggests it’s been anglicised, like in “pinyin”, or in anglicised Chinese names). Frankly, i don’t see why this should be a problem. I understand it fine. Presumably people who write that way do as well. And really, i personally find pinyin _without_ spacing between every syllable exceptionally difficult to read. At times i have to consciously separate the characters in my mind to read them properly.

  35. Chaon said,

    August 16, 2012 @ 2:00 am

    我非 常不 同意。

  36. Faldone said,

    August 16, 2012 @ 6:02 am

    Reminds me of the Texas lady who went into a toilet in Germany marked Herren. She said, I thought they must be His’n and Her’n.

    Well, the other room said “Da men”. They left the space between the words but the Germans are always doing that.

  37. APOLLO WU said,

    August 16, 2012 @ 6:19 am

    Latin took 4 centuries to change from no word separation to the existing mode with word separation. This fact simply shows word separation is not obvious, and may vary between languages or even in one language at different times. I pick up the following from a website by Shandong University in PRC. In it an argument is made for Pinyin word separation:

    人人都来维护环境卫生 (Character text with no word separation)

    一种写法是 (one way of writing the same in Pinyin):


    另一种写法是: (another way of writing in Pinyin):


    正确的写法应该是:(the correct way of writing in Pinyin)


    这样拼写叫做“分词连写”。 This latter way is writing Pinyin with word separation.

  38. Tom Bishop said,

    August 16, 2012 @ 10:39 am

    Thom asked, “is pinyin a native-user or non-native-user controlled invention? Are these rules codified in a manual?”

    Pinyin orthography is controlled by official Chinese national standards, and it was invented by Zhou Youguang and other Chinese experts. The most essential standards are

    “汉语拼音方案 (Scheme for the Chinese Phonetic Alphabet)” (1958)


    “汉语拼音正词法基本规则 (Basic Rules for Hanyu Pinyin Orthography)” (GB/T 16159-1996)

  39. Victor Mair said,

    August 16, 2012 @ 2:17 pm

    The official orthographic rules that Tom Bishop mentioned are available in an English translation as an appendix at the back of the ABC Chinese dictionaries published by the University of Hawaii Press.

  40. Victor Mair said,

    August 16, 2012 @ 3:45 pm

    From an advanced student in Chinese linguistics:


    I didn’t know about this (the leap backwards from from writing words together to writing syllables individually), but of course I should have noticed it. The PRC-published textbooks I used in Beijing in 1995 (Xiàndài Hànyǔ jiàochéng 现代汉语教程 A Course in Contemporary Chinese), which I see are from the 1993 printing of an edition first published in 1988, join syllables together when necessary (a randomly chosen sentence: “Xuéxiào jiàoxuélóu pángbiān yǒu yízuò xiǎo qiáo” – even tone sandhi is noted). The bureaucratic changeover certainly explains it, but it’s pretty depressing.

  41. Kevin said,

    August 16, 2012 @ 7:55 pm

    So am I to understand that the native-speaker judgements of Lydia, Chaon and I are to be disregarded because of the opinions of some experts and language authorities?

  42. Kevin said,

    August 16, 2012 @ 7:57 pm

    I’m not arguing from principle or theory here. I’m saying it’s incredibly difficult to understand.

  43. Victor Mair said,

    August 16, 2012 @ 11:16 pm


    Apollo Wu is a native speaker, Zhou Youguang is a native speaker, Feng Zhiwei is a native speaker, Fang Shizeng is a native speaker, Lu Xun was a native speaker, Ni Haishu was a native speaker, Wang Jun was a native speaker, Yin Binyong was a native speaker, Zhang Liqing was a native speaker, the authors of the “汉语拼音正词法基本规则 (Basic Rules for Hanyu Pinyin Orthography)” (GB/T 16159-1996) were native speakers, the authors of the (Xiàndài Hànyǔ jiàochéng 现代汉语教程 A Course in Contemporary Chinese) were native speakers, and there are lots of other proponents of word-segmented pinyin who are / were native speakers. All of them were also language authorities, and all of them had cogent reasons for their support of word-segmented pinyin.

    Since, as you say, you are “not arguing from principle or theory”, perhaps this information will make it easier for you to understand.

  44. MC said,

    August 16, 2012 @ 11:49 pm

    @ Thom – you completely misunderstood the main purpose of this post – no one is arguing about the myriad sounds/syllables that a single Chinese character can produce. However, Victor Mair and others are talking about the relationship between words. And also, the discussion is not so much about the ease of pinyin reading, but about the syntax and semantics that reflects the grammatical characteristics of the Chinese language as rendered in the Latin script.

    Why do you think that Linear B is considered a “(Mycenaen) Greek” language, despite its glyphps being written in non-Greek alphabet? And why is Coptic not considered a Greek language, despite it being written in Greek alphabet and utilizes a great amount of Greek vocabularies? Or why Uyghur isn’t Arabic? What makes a language unique is not just the characters or alphabets that it uses, but the syntax and morphology.

    Bear in mind that I’m not a linguist/philologist and I have zero training in the disciplines (I can’t even read pinyin tone marks…) so what I said could be way off, but I know that getting the correct transliteration (is this the right word to use?) and word grouping are particularly important for learning “glyph” based languages. For example, students do a lot of transliteration exercises (into Latin script) in Egyptian hieroglyphics class, one obvious reason is that it’s faster than drawing the glyphs, but they do this type of exercise because the Latin transliteration often reflects the syntax in a much clear way than the glyphs themselves. At least this is what I was told by the professor whom I took a year-long course in Egyptian hieroglyphics in Cairo.

    I do not understand your Michael Jackson analogy.

    About Jay Sekora’s example of “shuõbushuõ” and whether it actually communicates more information than “shuõ bu shuõ.” I personally think not, but that’s just because the example he is using: there is no other alternative way (that is logical) to read into “shuõ bu shuõ.” But more importantly, “shuõ bu shuõ” cannot be joined together based on grammatical reasoning. Any way you look at it, X 不 X in pinyin would have to be separated.

    But consider, for example, the sentence that Apollo Wu cited:


    Without proper grouping of words, how do you know that REN REN DOU LAI WEI in the sentence cannot be rendered as 人人都來为…? This was my initial “guess” without looking at the Chinese characters, but then I have to figure out 都來为 what? HU HUAN 呼唤/互换?? But then HU HUAN what??

    I’m sure you know that there is a huge difference between 都來为… and 都来维护… Inserting the tone marks would help “decoding” the sentence, but you still cannot resolve the fundamental issue of syntactic/semantic ambiguity. I will eventually figure out that 都來为 makes no sense context-wise, but this realization isn’t “apparent” to me because of the word separation. Maybe it has to do with the fact that I’m not a native Mandarin speaker (but I understand and speak Mandarin without difficulty), but even if you rework the sentence and split it up into Cantonese jyutping (my native tongue), I still have to take a moment to figure out what it means. But if you group related words together, like this: RENREN DOU LAI WEIHU HUANJING WEISHENG, then the context will become immediately apparent to me.

  45. MC said,

    August 17, 2012 @ 12:01 am

    @ Kevin – Chaon did not say who or what s/he’s disagreeing with. He just says he disagrees…maybe he disagrees with you. How do you be so certain?

    …unless, you are him????

  46. Bob Violence said,

    August 17, 2012 @ 12:16 am

    The state of pinyin education in China today is such that students are apparently not even taught how to write their own names. When I taught at a Chinese university a couple of years ago, I asked all of my students to write their names in pinyin on the class roll. Of the hundreds of students I taught, maybe three or four used the proper pinyin forms, e.g. “Wáng Xiùyīng.” Everyone else separated out every syllable and used no capital letters, e.g. “wáng xiù yīng.” I even had some students tell me that capital letters don’t exist in pinyin and that using them is flat-out wrong.

  47. Chaon said,

    August 17, 2012 @ 5:00 am

    I am neither Kevin nor a native speaker. The “strongly disagree” was not directed at anyone. I thought that by appearing to leap into the argument while using incorrect spacing in my sentence, you would all laugh uproariously and think me quite witty.

    It appears I should try to keep my day job.

  48. Gene Buckley said,

    August 17, 2012 @ 8:27 am

    A central issue here is the different purposes that people have in mind for pinyin. Although it is often used simply as a way of transcribing Chinese characters (and we’ve seen here that the recent trend in China is to use the system only in this manner), it also has the ability to serve as an alternate means of representing the Chinese language.
    The first approach takes the character writing system as basic; the second takes the language itself as basic. Under this second, linguistically standard view, one can represent more (or less) than what the characters represent, such as the difference between a sequence of two monosyllabic words and a disyllabic compound. This distinction is a real part of the language, even if the character-based writing system has historically ignored it. (But even that is by no means necessary; one could decide to add spaces or word dividers to a character text.)
    For example, characters disambiguate most homophonous morphemes, although not all of them; e.g. in simplified characters, 后 represents both hòu “empress” and hòu “behind” (traditional 後). But pinyin transcription with word boundaries can disambiguate sequences that characters treat the same. One example I’ve seen mentioned recently is 在公車站立牌 “to put up a sign at the bus stop”. Here the sequence 公車站 is one word, gōngchēzhàn “bus stop”. But the string of characters 公車站立 could also represent two disyllabic words, gōngchē “bus” and zhànlì “stand up straight”.
    In this overall context, that interpretation doesn’t make sense; but the same is true for most of the homophony that is “resolved” by characters: in normal communication, whether spoken or written, these ambiguities are eliminated by taking into account the surrounding material and the larger situation. Thus 王后 is easily recognized as wánghòu “queen” despite the ambiguity of 后 as a character; and the pinyin orthography that writes those two syllables together makes the ambiguity of the syllable hòu even less relevant, since the reader doesn’t have to consider (however briefly or unconsciously) whether it is functioning as a preposition “behind” to be grouped with a following word.

  49. Victor Mair said,

    August 17, 2012 @ 9:22 am

    For those who read Chinese, this article on the Shandong University “Language and Script” page puts the matter succinctly, clearly, and persuasively. The title of the article is “Wèishéme hànyǔ pīnyīn tíchàng fēncí liánxiě?” 为什么汉语拼音提倡分词连写?(Why does Hanyu Pinyin promote word segmentation?). If someone has the time and energy, it would be great to have an English translation of this short, but information-filled, article.


    (sent to me by Apollo Wu)

  50. Victor Mair said,

    August 17, 2012 @ 9:50 am

    This is the form in which Apollo’s message came to me and a group of other individuals who are interested in Chinese language and script reform:



    请 看看 上面 山东大学 题为 ‘ 为什么汉语拼音提倡分词连写?’的 网页。如此 清晰的 说明,还需要 开会 推广吗?中国的 教育家们 是不是 低能 呢? 还是 另有 原因 呢? 大家 能否 做一点‘研究”(再找寻 research)吗? 外国 有 学者 说,中国的 语文 保守品派 害怕 拼音 变成 文字,因而 拆散了 词化 拼音。 真是这样的吗? -老吴


    I hope that the word segmentation shows up when I submit this comment (sometimes what one enters in the compose window isn’t exactly what appears when the comment gets submitted). In any event, even for those who cannot read Chinese, at least some of the word spacing should be evident in the character text as typed by Apollo. This matches perfectly with what Gene Buckley said in his comment.

    One of the most learned Sinologists of the 20th century, Tse-Tsung Chow of the University of Wisconsin, advocated exactly this method for purposes of lexical, grammatical, and syntactical clarity, and he did so long before the age of computers.

  51. Victor Mair said,

    August 17, 2012 @ 1:27 pm

    Here’s another article in Chinese, this one by Fang Shizeng, arguing for the restoration of word segmentation in pinyin instructional materials:


  52. arthur waldron said,

    August 17, 2012 @ 4:54 pm

    We are nigh on a hundred years of “reforming” Chinese and as far as I can see it is a bust. Everyone would be better off with traditional characters and zhuyinfuhao taught in serious schools for all. The language is never going to be “simplified.” Education has to be improved.

  53. Lillian said,

    August 17, 2012 @ 5:08 pm

    The idea that (Mandarin) Chinese is made of monosyllabic words is such a pervasive notion that even my Chinese teacher (from Shanghai) told it to us (along with the “dialects, not languages” idea). I was just reading Stephen Fry’s The Ode Less Travelled and there the monosyllabic meme was again, although I think he managed to write “the Chinese languages.”

  54. Victor Mair said,

    August 17, 2012 @ 5:33 pm

    @Arthur Waldron

    But zhuyin fuhao IS a kind of language reform, isn’t it? If you think that language reform is “a bust” or you are opposed to it for one reason or another, then you should be against zhuyin fuhao. If you believe that the characters as they existed a hundred years ago were perfectly good and that all that is necessary is for China to have “improved” education, what need would there be for zhuyin fuhao (phonetic annotation of the characters, which is basically what pinyin is all about)? And what would your “improved” education look like? Of what would it consist?

  55. arthur waldron said,

    August 17, 2012 @ 7:20 pm


    I suppose I should be arguing for jiaguwen and urging classrooms be well stocked with tortoise shells and scapulas. Somehow the architectural strength of written language as unifier cannot easily be converted into the standard nationalistic force of spoken language, written down, as unifier. Turns out that the inclusive language is inadequate while the spoken languages lack inclusivity–indeed give rise to the same fissiparous tendencies we all know. When I read about all the really smart people who attempted to figure out what do do about Chinese–and look at the high intellectual level even of this blog–I guess I am forced to sigh and reflect “it must be impossible.” Of course zhuyinfuhao is an alphabet and guoyu is an artificial language–but I thought they deserved a favorable mention as they come close to working. OT but what about Japanese. They could romanize without so many problems. I remember Edwin Reischauer strongly advocated doing so.

  56. Qitong said,

    August 17, 2012 @ 9:38 pm

    The official standard (汉语拼音正词法基本规则) prescribes the use of segmented-word pinyin. This well-designed standard was laid down in 1996 and will be replaced by a newer one in October, which, it seems, will continue insisting on such practice. So throughout the years the single “technically correct” way to Romanize Chinese has always been to insert spaces when necessary. As a native speaker, I find it reasonable for the two reasons: First, most ordinary Chinese people are not used to reading long paragraphs of pinyin, and therefore proper segmentation will help them grasp the idea promptly. Second, and more importantly, most of the pinyin contents are prepared for non-native speakers. Thus a word-by-word form is more familiar to them than a character-by-character one and hence makes more sense.
    The problem, however, is that few people actually write according to the official standard. Nowadays Chinese children tend to learn pinyin in kindergarten and later in elementary school. You can’t expect those teachers who themselves do not know a single word about such a standard to somehow teach their students to write properly. Besides, pinyin is considered too fundamental to be covered in 高考(the college entrance exam), so few people bother to care about it at all.
    When I came to the US this year I noticed that 汉语拼音正词法基本规则 is included in a basic Chinese textbook. The student from my host family uses that textbook. Naturally he adheres to the correct respelling from the very beginning. It is so interesting, and somewhat sardonic, that Americans are in some respect better at hanyu pinyin orthography.

    BTW, your point about 改革 sounds riveting but it seems to me a bit overanalytical. And how do you like the new standard, as per which capital Ü is to be replaced by LU?

  57. APOLLO WU said,

    August 17, 2012 @ 10:20 pm

    Taking the cue from the IBM keyboard which retained the QWERTY keyboard but simply adding new keys. Chinese language reform can start with adding necessary new ‘keys’, such as stipulating all publications must have Pinyin Titles together with that in Chinese characters. A reference web-site should be setup to provide correct word-based titles in case the publishers have question about that. All bank branches should be listed based on Pinyin. Btw, MS WORD can sort Chinese text in Pinyin order. All street names should be written in accordance with Word-based Pinyin standard. Similar steps like these will go a long way to reduce the existing Chaos created by the Chinese writing system.

  58. Victor Mair said,

    August 18, 2012 @ 7:52 am


    The letter “v”, which is not used in the standard Hanyu Pinyin alphabet, has become the de facto way to write “ü” for those who find it difficult, annoying, or time-consuming to add an umlaut. Thus “lv” for “lü” and “nv” for “nü”, which is how “ü” is entered in many widespread inputting schemes, such as that used for Google Translate, Microsoft Word, etc.

    As for the sensitivity to gé 革 (“reform”), as it occurs in gémìng 革命 (“revolution”) and other words, I do not think I am being overanalytical. You characterize as “riveting” my account of the changeover in name of the committee charged with language and script reform, but I think what I tell you next will be even more stunning. Once, when I was walking around the Unnamed Lake 未名湖 at Peking University with Yin Binyong, a dear friend and outstanding applied linguist, I used the word gǎigé 改革 (“reform”), with intentionally strong emphasis on the second syllable. He trembled and said, “We don’t use that kind of language anymore.” During the second half of the 80s, I had similar experiences with others when I used words like yāpò 压迫 (“oppress”), dòuzhēng 斗争 (“struggle”), and other terms associated with the Great Proletarian Cultural Revolution and the tumultuous 50s.

    @Geoffrey Wade

    There are plenty of Sinitic words in which the constituent syllables do not mean anything by themselves: qílín 麒麟 (“kirin” [not “unicorn”]), fènghuáng 凤凰 (so-called “phoenix”), zhīzhū 蜘蛛 (“spider”), shānhú 珊瑚 (“coral”), qiūyǐn 蚯蚓 (“earthworm”), xīshuài 蟋蟀 (“cricket”), húdié 蝴蝶 (“butterfly” — of mythical proportions, deftly dissected by George Kennedy, the brilliant Yale professor), pàngdūdū 胖嘟嘟 (“pudgy, plump'”), shǎbùlèngdēngde 傻不愣登的 (“bewildered, dorky, doltish, daffy”), and so forth. Even where the constituent morphemes do contain some sort of individual meaning, they are more often than not *bound*, i.e., they cannot be used alone, so their alleged independent meaning doesn’t really exist (it exists only in the context of words). Perhaps the best and easiest way to acquaint yourself with the concept of bound and unbound is to look into the wonderful, little Concise Dictionary of Spoken Chinese by Yuen Ren Chao and Lien Sheng Yang. Furthermore, the countless hyphens you advocate look really ungainly (almost as bad as InTerCapITalIZaTion) and take up unnecessary space. Finally, you overlook the IT aspects / advantages of properly segmented pinyin, which have been pointed out by many of the commenters in this thread in and the sources that I and others have provided.

  59. arthur waldron said,

    August 18, 2012 @ 2:26 pm

    I thought fang and huang were male and female as in feng qiu huang.

  60. Victor Mair said,

    August 18, 2012 @ 3:18 pm

    @arthur waldron

    That’s just an ex post facto false etymology. The fènghuáng 凤凰 (so-called “phoenix”) was originally a single bird.

    It’s sort of like the word for lute (or “balloon guitar”) in Chinese, which is pípá 琵琶 (there are several different ways to write it in characters; probably from the Iranian word “barbat”). The false ex post facto etymology, which you can find in even well-known Chinese dictionaries, is that pí means “down stroke” and pá means “up stroke” (or vice versa). Pútáo 葡萄 (“grape”) is another of these disyllabic, monomorphemic words written with two graphs, and is probably also of Iranian origin).

  61. Jonathan Benney said,

    August 20, 2012 @ 9:11 pm

    Readers may be interested in the paper below (written by James Wu 吴坚立, one of my high-school Chinese teachers and an extremely significant figure in secondary school Chinese teaching in Australia), on teaching secondary school students pinyin.


  62. Victor Mair said,

    August 26, 2012 @ 7:07 am

    Just went back and reread Chaon’s first comment. That WAS clever, but the first couple of times I read it, the misspacing escaped me, and I think that it escaped everyone else.

RSS feed for comments on this post