The uses of Hanyu pinyin

« previous post | next post »

Hànyǔ pīnyīn 汉语拼音 ("Sinitic Spelling") is the official romanization of the PRC.  It also comes with an official orthography which provides guidelines for word separation, punctuation, and how to deal with grammatical constructions.  An English translation of the basic orthographical rules by John Rohsenow can be found at the back of the various editions of the ABC Chinese-English Dictionary from the University of Hawai'i Press.

Despite the fact that pinyin has often been invoked, both by opponents and proponents of Chinese script reform, especially in the firestorm of discussions that has taken place on Language Log and in other venues during the past week, most people probably do not realize to what extent pinyin has already become an essential part of life in China.  To remedy that lack of understanding about what pinyin is actually used for every day, I will simply list a few of its applications in education, commerce, science, manufacturing, architecture, construction, and countless other fields.  There is no particular significance to the order in which I list these applications.

1. to teach children and illiterate adults the basics of reading and writing

2. archeologists use pinyin to designate cemeteries, tombs, houses, waste pits, and other elements of the sites they work on

3. in museum labels and catalogs, pinyin is used to annotate the pronunciation of very obscure terms for bronze vessels, weapons, etc.

4. to annotate the sounds of unfamiliar characters in written materials of all sorts

5. in advertising and on packaging

6. braille

7. semaphore

8. road signs

9. teaching Mandarin to non-native speakers

10. designating components and parts of items to be assembled mathematics, physics, and chemistry

12. in dialectology, phonology, and other sub-fields of linguistics

13. inputting texts in computers, cellphones, and other electronic devices

14. to write down expressions for which there are no known characters or for special effect, particularly on the internet

15. book titles, publication data, and cataloging

16. indices of books

17. ordering of dictionary and encyclopedia entries

18. spelling the names of Chinese citizens on passports and other official and unofficial documents

19. for retrieving passports of citizens who have applied for exit permits

See "Passport pickup by pinyin" (3/02/12)

20. ordering names in lists of people

When I first started studying Chinese 50 years ago, some of these latter functions did not yet use alphabetical ordering.  Instead they relied on radical plus residual stroke count, total stroke count, category, etc.  Increasingly, however, and especially in recent years, such ordering functions are being taken over by pinyin.  Naturally, pinyin's role as a device for transcription has only grown with time.

This is just to get the ball rolling.  I'm sure that Language Log readers can think of many other ways in which pinyin is used.

Incidentally, all of these applications contribute to the growth of digraphia in China.

[Thanks to Neil Kubler]


  1. Jon Forrest said,

    May 22, 2016 @ 8:15 pm

    According to, Vietnamese once used Chinese characters. From what I've read (I don't claim to be an expert), Vietnamese phonetics are just as complicated as Mandarin. Yet, somehow Vietnamese made the jump to roman characters.

    The fact that this happened should be strong evidence that linguistically, Chinese could make the same jump. As always, there are many non-linguistic reasons why this will never happen, but many Chinese people poopoo the idea of switching to roman characters because they think romanized Chinese would be unreadable because of the similarity of many Chinese words. Vietnamese proves that this is untrue.

    (I'd enjoy hearing from a Vietnamese expert about why this worked in Vietnam. Were there similar arguments against it?)

    Jon Forrest

  2. Jim Breen said,

    May 22, 2016 @ 8:44 pm

    Re "ordering of dictionary and encyclopedia entries", I recall some years ago showing an ex-PRC colleague my kanji dictionary database, which has a large number of kanji indexing methods including the traditional radical+stroke-count, four-corners, etc. as well more recent Japanese-oriented methods such as Jack Halpern's SKIP. He remarked that he knew of the older methods because his parents used them with dictionaries, but his generation usually used pinyin.

  3. julie lee said,

    May 22, 2016 @ 8:58 pm

    I live in the San Francisco Bay Area and sometimes go to Chinese bookstores to look for Chinese books. They are never arranged in alphabetical order on shelves, so it is frustrating. Likewise in a Chinese bookstore in Taipei a few years ago. I wonder if Chinese libraries in Taiwan or China arrange their books in alphabetical order using Pinyin or some other alphabetical system.

  4. Jim Breen said,

    May 22, 2016 @ 9:14 pm

    Getting off the topic a little, when I was living in Tokyo some years back I joined the local municipal library, mainly to borrow CDs. They were (understandably) ordered by composer name using the syllabary (五十音) order. Not a problem but it did feel a little odd that Shostakovich was towards the front, and Bach and Beethoven were down the back.

  5. maidhc said,

    May 23, 2016 @ 1:01 am

    Jon Forrest: In Vietnamese, Chinese characters were basically used for their phonetic value. So in order to write Vietnamese you first had to learn to speak and write Chinese. Even with the later Vietnamese characters a fair knowledge of Chinese was required. Plus official documents were written in classical Chinese. This meant that literacy was confined to an elite who could afford to have their children taught a foreign language. The literacy rate in modern Vietnam is 94.5%

    Chinese children learning to write have the advantage that they already speak Chinese. The literacy rate in China is 96.4%. (CIA Factbook)

    The comparative advantage of changing to a romanized script would have been much higher in Vietnam than in China. Since China already has a high literacy rate, changing to a Romanized script would not make much difference in literacy, according to these numbers.

    The disadvantage of changing is that most Vietnamese are now unable to read historical documents.

  6. John said,

    May 23, 2016 @ 4:14 am

    @julie lee: In Taiwanese bookstores books are either arranged in order of the number of strokes in the author's last name, or by zhuyin order (b-p-m-f-d-t-n-l etc.)

  7. Victor Mair said,

    May 23, 2016 @ 6:42 am

    In Asia's Orthographic Dilemma, Bill Hannas lists a dozen reasons (pp. 87-92) why this worked in Vietnam.

  8. Victor Mair said,

    May 23, 2016 @ 6:44 am

    The literacy rate in China is 96.4%. (CIA Factbook)

    The CIA got snookered — badly.

  9. Vanya said,

    May 23, 2016 @ 6:59 am

    Despite the fact that pinyin has often been invoked, both by opponents and proponents of Chinese script reform, especially in the firestorm of discussions that has taken place on Language Log and in other venues during the past week, most people probably do not realize to what extent pinyin has already become an essential part of life in China.

    My impression is that most of the people involved in the discussions on Language Log live or have spent significant time in the PRC. I fail to see what is particulary interesting in 2016 about discussing pinyin. The odds of pinyin being replaced at this point by a superior and more foreigner friendly system of romanization are remote, and I see no chance of zhuyin fuhao, a superior system for native speakers, challenging pinyin either. Pinyin seems to be the default solution that is good enough, and pushed by the PRC government. Real Chinese writing reform would mean replacing pinyin with a syllabary, but for the same entrenched reasons and vested interests that characters won't disappear, pinyin is probably not going anywhere either.

    The fact that digraphia is an irreversible trend in China should also be fairly obvious to anyone with even a casual acquaintance with the culture. Maybe because I came to Mandarin from Japanese, where digraphia has a long tradition, and because I have never had to learn Mandarin in an environment where pinyin was not the default method of teaching to foreigners, I don't experience Chinese digraphia as anything particularly paradigm shaking. This why I don't get talk about "language reform" in 2016. What else needs to happen? Is there any urgency? I don't see any reason why characters and pinyin can't coexist for a long time to come.

  10. Victor Mair said,

    May 23, 2016 @ 7:35 am

    …pinyin is probably not going anywhere either.

    You need to go back and reread the o.p. Pinyin is going somewhere. Year by year, its use is expanding as part of the emerging digraphia we have talked about so often on Language Log.

  11. Anonymous Coward said,

    May 23, 2016 @ 9:08 am

    Note that Classical Chinese in Vietnamese pronunciation is much more readable than in Mandarin pronunciation.

  12. julie lee said,

    May 23, 2016 @ 11:50 am

    "In Taiwanese bookstores books are either arranged in order of the number of strokes in the author's last name, or by zhuyin order (b-p-m-f-d-t-n-l etc.)."

    Thanks John. Horrors, if the books are arranged by the number of strokes of author's last name. And arrangement by zhuyin order is far inferior to alphabetic (abc) order, in my opinion, even though I know the zhuyin signs. I think far more Chinese native speakers from Taiwan and China use pinyin for e-mail than zhuyin, which indicates that the alphabet-based pinyin is handier. I would think arrange alphabetical order indispensable for all filing systems in Taiwan and China—medical and hospital files, corporate files, government files.

    Incidentally, an anecdote to illustrate how cumbersome the number-of-strokes order is: In his autobiography "Storm Clouds Clear over China" (English translation, Hoover Press), Ch'en Li-fu (who was Chiang Kai-shek's closest confidante and advisor throughout Chiang's career)
    said that in the late 1920's when he was Chiang's private secretary and in charge of all Kuomintang personnel files, he had to invent his own system of filing. He probably didn't know the various alphabetical systems of romanization. He said he invented his own system based on the characters which he said was very quick and handy. He was very proud of this invention. I don't remember if he gave all the details. Obviously he had to invent his own system because the traditional number-of-strokes-order was too cumbersome and slow.

  13. Eric said,

    May 23, 2016 @ 12:05 pm

    Pinyin order for bookstores is a great blessing. Stroke count is very cumbersome, but at least it is an order. Both are far superior to something I've seen occasionally in medium-sized bookstores in both PRC and Taiwan, namely, order (as best I can divine) by publishing house. Looking at the shelves it almost gives the impression of order by color and is about as helpful as that would be.

  14. Guy said,

    May 23, 2016 @ 12:16 pm


    Roughly what percentage of Chinese children are fluent in Mandarin before learning characters?

  15. Guy_H said,

    May 23, 2016 @ 2:06 pm

    Oh yes, its VERY annoying how bookstores in Taiwan (and Hong Kong too for that matter) like to use stroke order. Who on earth has the time to count strokes in their head? Its the same reason why I dislike stroke order dictionaries.

    Having said that, I'm confused why an alphabetical order would be superior to zhuyin? In my mind, a-b-c-d is as arbitrary a sequence as bo-po-mo-fo.

  16. Victor Mair said,

    May 23, 2016 @ 2:45 pm

    From Stephen O'Harrow:

    First of all, read our friend John DeFrancis's Colonialism and language policy in Viet Nam (The Hague: Mouton, 1977) where he goes into the question at length – we went to France together in 1975 and that's where he wrote the book (I added a pinch of salt here and there). And, as you know, John was a proponent of Chinese romanization (he even invented a very good one himself, the Yale method, used by all of us who learned from the Yale Mirror Series [during the late T'ang when I was in school, as I vaguely remember]).

    John thought exactly as you stated, that the Vietnamese example proved conclusively that it was indeed quite possible to employ romaniztion for achieving mass literacy in China. And, as you state, there are plenty of homophones in Vietnamese and the quốc ngữ system works very well, nonetheless. The reasons that some folks opposed the Vietnamese romanization at first were mainly political, as are nearly all the reasons why the PRC government refuses to act like decent Communists and do the right thing for the masses of its own people.

    As is not infrequently the case, the Vietnamese provide a good "mirror of proper conduct" for the Chinese. Need I say more?

  17. J.W. Brewer said,

    May 23, 2016 @ 3:46 pm

    Re maidhc's point that "The disadvantage of changing is that most Vietnamese are now unable to read historical documents." If you were running a sufficiently brutal and despotic government (whether the French colonial regime in Indochina, the Maoist regime in mainland China, the Ataturk regime in Turkey, etc etc) you might well view obliterating the legibility of historical texts as a feature rather than a bug.

  18. JS said,

    May 23, 2016 @ 4:32 pm

    I don't have a horse in this race but have found the discussion engaging. At least, from a linguistic perspective, it should be obvious that (1) writing Chinese with a (more) phonetic script like an alphabet is perfectly possible; and that (2) learning to read and write is (to some degree) harder for native users of the Chinese script than for their alphabetic counterparts.

    I'm interested in the statement that pinyin is used to "teach young children … the basics of reading and writing." This is worth thinking about in detail since it's clear that, as far as simply the nature of the relationship between script and speech, the Chinese writing system is the more intuitive, with alphabets requiring more advanced analysis.

  19. Guy said,

    May 23, 2016 @ 5:48 pm


    I wonder how much of the reason is to teach standard pronunciation, especially with unfamiliar vocabulary items, as opposed to the relative ease of learning all the letters compared to characters?

    ēvən ăz ə nātĭv spēker ī stĭl rēmĕmber lernĭng tōō rēd ĭn ə rītĭng sĭstəm thăt lŏŏkt līk THĭs bēfor rēdĭng wĭth stănderd spĕlĭng.

    Except the diacritics over double-o's were one big diacritic. I also remember about half of the class being confused at the distinction between ä and ŏ, which were used to represent the two vowels involved in the cot-caught merger.

  20. julie lee said,

    May 23, 2016 @ 7:02 pm

    @Guy_H says:

    " I'm confused why an alphabetical order would be superior to zhuyin? In my mind, a-b-c-d is as arbitrary a sequence as bo-po-mo-fo."

    Yes, theoretically, both are arbitrary systems and as such should be equally good. However, the zhuyin (bo-po-mo-fo) system has 37 symbols whereas the alphabet has only 26. Also, I believe most Chinese first learn the alphabet and uses it more widely (in math and science, for example) than bopomofo, so the alphabet is more familiar to everyone than bopomofo. I don't think the millions of Chinese in China learn the bopomofo, though almost all learn the alphabet. Many or most Chinese of an older (my own and my parents') generation in Taiwan have never learned bopomofo, though we all know the alphabet. Also, the alphabet is known around the world, which is far from the case with bopomofo. I myself can read and write bopomofo fast but I am still more comfortable with the alphabet. (Handwriting-wise, the alphabet can be written cursively, but not the bopomofo.) In a nutshell, the alphabet is more familiar and more universal.

  21. Victor Mair said,

    May 23, 2016 @ 7:03 pm


    …as far as simply the nature of the relationship between script and speech, the Chinese writing system is the more intuitive, with alphabets requiring more advanced analysis.

    This is true — up to a certain point, about thirty characters. In support of this view is the well-known article of Penn psychology professor Paul Rozin and his students (?), Susan Poritsky, and Raina Sotsky, "American children with reading problems can easily learn to read English represented by Chinese characters," Science, 171 (1971), 1264-67. When it was first published, this article caused quite a sensation. It quickly became known in China and was widely viewed by supporters of hanzi as proof that Chinese characters were superior to the alphabet. Rozin's article with Poritsky and Sotsky was followed by additional publications touching upon the same theme, some of them written with other co-authors.

    The results were said to be miraculous, with children having dyslexia and other reading disorders being able to read "overnight".

    All of this was watched very intently in China throughout the 70s, with the major findings of Rozin et al.'s experiments being made available in Chinese.

    When I came to Penn in 1979, I knew that sooner or later I'd have to talk to my new colleague Paul Rozin about his experiments and publications, but that didn't happen till after 1981 when I made my first trip to China and became closely affiliated with the Script Reform Committee of China (Zhōngguó wénzì gǎigé wěiyuánhuì 中国文字改革委员会), which at that time was directly linked to the State Council (Guówùyuàn 国务院); this means that it had a lot of clout (that is why they could push through Hanyu Pinyin and character simplification).

    In the early days, they also paid a lot of attention to orthography, with the intention of phasing in Hanyu Pinyin as an alternative script. Later, the power of the Script Reform Committee was greatly weakened when its name was changed to the State Language Commission (Guójiā yǔyán wénzì gōngzuò wěiyuánhuì 国家语言文字工作委员会) and it was placed under the Ministry of Education (Jiàoyù bù 教育部). It is no accident that, after that happened, educational materials no longer adhered to the orthographical rules, but phonetically annotated the characters one syllable at a time. This vexed Wang Jun, one of the leading language planners and reformers of China, so greatly that he died shortly thereafter (he himself told me not long before his death that he was "so angry he could die" (qì sǐle 气死了).

    During the early 80s, while the Script Reform Committee was still very powerful, they too decided that they needed to talk to Paul Rozin. So when I showed up in their offices in 1981, they were overjoyed to learn that I taught at Penn. Zhou Youguang himself came to stay in my home for awhile, but we weren't able to arrange for a meeting with Prof. Rozin at that time. Subsequently, my late, dear friend, Yin Binyong, who was the leading authority on Pinyin orthography, came to stay for a longer time and we (Yin, my wife, and I) were able to meet with Paul; we had an excellent, productive conversation.

    What emerged from all of this was the acknowledgement that it is very easy to recognize a handful of simple characters (such as that for "car"), that it is possible to string a few of them together into short "sentences", and that it is even possible to "read" such sentences. Unfortunately, the whole system quickly begins to break down above about 30 characters, because the children start to confuse the characters and can't remember which character to link to which words.

    In the early 80s, it was not yet very easy for Chinese to travel abroad, but Zhou Youguang and Yin Binyong both came to Philadelphia because of the high position of the Script Reform Committee within the Chinese government. One of Yin's chief missions was to meet with Paul Rozin and to inform him of the severe skepticism about and deep concerns over the conclusions that were being drawn from the limited results of his experiments. Paul understood. Yin went back to China, and people there no longer made the earlier extreme claims that were being made for the superiority of the characters over alphabetical writing with regard to learning to read and write. Zhou Youguang, Ni Haishu, Wang Jun, Yin Binyong, and their colleagues at the Script Reform Committee, with their vast experience dealing with the acquisition of character literacy in China, knew that the conclusions being drawn from the Rozin experiments could not possibly be right.

    This gist of the controversy is covered in these three works that are undoubtedly known to many readers of Language Log:

    John DeFrancis, in The Chinese Language: Fact and Fantasy, pp. 171ff., discusses the premises, methodology, and weaknesses of the experiment by Rozin, et al.

    William C. Hannas, in the Writing on the Wall and in Asia's Orthographic Dilemma

    Hannas reviews the implications of the Rozin experiments with regard to the psychology of reading and especially emphasizes the inverse relationship between the acquisition of reading skills with alphabets and characters:

    Chinese characters: easy for the very beginning stages, but becoming excruciatingly more difficult for higher levels.

    Alphabetical writing: difficult for the initial stages, but progressively easier to build huge vocabularies for higher level reading skills.

    Thus, JS is right to say that "alphabets requir[e] more advanced analysis"; the principles of alphabetical reading are conceptually more demanding, but once they are mastered, their power to augment vocabulary for reading and writing is almost magical.

  22. John said,

    May 23, 2016 @ 10:33 pm

    @julie lee: People in Taiwan know the alphabet, not hanyu pinyin. As such alphabetical ordering would be pretty opaque to most Taiwanese people, especially the infamously frustrating j/q/x/z/c/s sequence.

  23. julie lee said,

    May 23, 2016 @ 11:22 pm

    @Eric said:
    " something I've seen occasionally in medium-sized bookstores in both PRC and Taiwan, namely, order (as best I can divine) by publishing house. Looking at the shelves it almost gives the impression of order by color …."

    Exactly, that's what I've seen here in Chinese bookstores (all small) here in the San Francisco Bay Area. If there's any order to the books, it's by publishing house or series, and that ends up as arrangement by color or appearance (shape, size, and color).

    @John said: "People in Taiwan know the alphabet, not hanyu pinin. As such alphabetical ordering would be pretty opaque to most Taiwanese people…."

    Yes indeed. But people in Taiwan can easily learn the pinyin alphabetical spelling of Chinese characters (or Wade-Giles alphabetical spelling of characters). Each can be learned very quickly, an hour? A few hours? There's a Chinese proverb, "Yi lao yong yi (一勞永逸)“ meaning "One labor forever free"–spend one hour (or a few hours) memorizing Pinyin or Wade-Giles romanization and then you're forever free (to enjoy the fruits of your labor). If millions of people in Mainland China can learn pinyin, people on Taiwan can certainly learn pinyin or Wade-Giles romanization. Pinyin has been resisted on Taiwan because it's seen as "Communist". I think the Taiwan post office has used the Wade-Giles system , which dates to pre-Communist times.,

  24. JS said,

    May 24, 2016 @ 1:00 am

    Incredible, Prof. Mair; thank you. I recall having seen Rozin's articles when reading DeFrancis's Fact and Fantasy with you; the one from 1970-71 is here.

    Speaking forthrightly as a member of the younger generation, my main takeaway from DeFrancis is his absolutely extraordinary and thoroughly praiseworthy level of commitment to and emotional investment in the question of the script at a point in time when China's full participation in the modern world seemed to hinge on the direction chosen. I think this is a point we youngins would do well to reflect on before chirping too much about "what ain't broke." At the time, a lot seemed broke; some of the present debate, of course, revolves around how much is still broke, and exactly how broke.

    At any rate, precisely because the matter was so close to this heart, I think, DeFrancis wound up on the wrong side of debates with, for instance, Geoffrey Sampson, as well as (indirectly) with Rozin, in a way that is a bit painful to look back on.

    Re: the question of Rozin on reading specifically, DeFrancis simply overinterprets (just like the Chinese politicos, I've now learned from you) a perfectly innocuous point — specifically, that phonemic representation is "highly abstract" and that "some unit intermediate between the morpheme and the phoneme — for example, the syllable — might be more suitable as a vehicle for introducing reading" (Rozin et. al. 1970-71:1267). (For instance, DeFrancis [171] quotes Rozin as saying that "Chinese orthography maps directly onto meaning," a serious misrepresentation given that the latter actually says, not wrongly, that Chinese characters "map into language at the morphemic (word) level rather than at the phonemic level" [1264]; "[map] directly into the meaningful units [of language]" [1265], etc.)

    As is stated very clearly at Hannas, Asia's Orthographic Dilemma, p. 144, Chinese characters are an entirely incidental part of Rozin's experimental design. If the body of the article does not make this clear enough, he concludes by saying that "[a]n efficient orthography must satisfy only two requirements. It must be easy to learn and it must be productive in the sense that, after mastery, new words can be read without learning new symbols. […] The syllabary may meet these requirements." Such a description more or less excludes the Chinese script, of course.

    This means Rozin had no reason to remark on the question of the "burden" presented by the Chinese script: the experimenters invested three or four hours in teaching functionally illiterate American children twenty-something symbol-to-word mappings; the authors' remarking that the kids got "bored" and "ran into some difficulty" in the latter stages of the experiment isn't direct commentary on the possibility of investing 1000x that amount of time in learning to read and write 100x that number of symbols. We can deduce this is not something Rozin would necessarily recommend, but not that he would consider it impossible or even unreasonably difficult, whatever that might mean.

    Finally, the reason I bring all this up is my second-hand take on learning to read as a Chinese: very early, children become aware of a symbol-to-syllable~word~morpheme mapping, and learn to read dozens to hundreds of characters this way before being exposed to pinyin — which presents a novel and non-intuitive kind of decoding task that many or most Chinese never become truly proficient at. In this sense, the name "pinyin" is sadly appropriate — in the contexts in which it's presently used, people can rather clumsily put the given puzzle pieces together to "decode" a syllable ("puh" … "een" … "[ adds tone]"), but aren't really reading in the proper sense of the word; ironically, due simply to exposure, Chinese learners of English quickly get much better at reading a foreign language represented alphabetically. So, trying to stick here to objective description of the current situation as I see it, pinyin in the wild remains at present a tool for access to characters, and indeed an increasingly indispensable one, and not a script. (So second-graders, for instance, read age-appropriate character text highly fluently and the same pinyin text not-at-all-fluently.) As you know, I remain very interested in seeing if and under what circumstances this state of affairs might change.

  25. Kevin McCready said,

    May 24, 2016 @ 7:46 am

    I was standing on the streets of Lhasa with a Tibetan friend in about 1994 and I could make no sense of the Chinese shop signs. So my friend asked me to read them out in Mandarin. He couldn't speak a word of Mandarin, having escaped to India and migrated to Australia long before. He told me I was speaking Tibetan. The characters were used for their phonetic value.

  26. Victor Mair said,

    May 24, 2016 @ 9:50 am


    Thank you for your long and thoughtful comment.

    Yes, Hanyu Pinyin is not yet a script; it remains primarily a tool for accessing and ordering the characters, and secondarily it has all of those many other purposes (and more) that I listed above in the o.p., which are derived from its power as a transcriptional device. On the other hand, in tandem with its demonstrable growth as an increasingly important part of the emerging digraphia I have so often described, it may be in process of becoming a script.

    A lot depends upon how Chinese view and interact with Pinyin emotionally, cognitively, psychologically, and functionally. Functionally, hundreds of millions use Pinyin many times a day to input Chinese characters, so they are completely comfortable with it at that level. Emotionally, there are not many people who are still looking at Pinyin and calling it English (as I described in a previous comment).

    Thus, emotionally and functionally, Pinyin has become an integral part of Chinese life. What I don't really have a clear sense of is whether Pinyin is also developing script-like roles in the minds of a broad spectrum of Chinese individuals. For people like my wife and me, Pinyin long since had become as much a script as English alphabetic writing (we could read and write it as easily as we did English). Sometimes when I pick up something in Pinyin, open to any random page, and start reading aloud before my class (most of whom are usually from Sinitic speaking countries, they are stunned when I pronounce it as effortlessly and smoothly as though it were English. I described such an instance of this with a story about Afanti in this comment.

    But it's not quite that way yet with most Chinese, because they simply do not have the practice of reading Pinyin passages as integral texts.

    On the other hand, my wife used to go to rural villages in places like Guangxi (outside Nanning) and Henan (in the area around Loyang) and teach elementary school children and adult illiterates to read integral Pinyin texts. With an old (it was new then!) Super 8 camera, she filmed her students fluently and triumphantly reading aloud. The experience was exhilarating and inspiring! Those films, and others she made about language issues in China have great historical and scientific value, but I'm no longer sure where they are and whether they are still usable.

    There is some evidence indicating that Chinese individuals are gradually (and subconsciously) making a shift from treating Pinyin strictly as an accessory, transcriptional tool in service to the characters toward a state in which it possesses script-like characteristics. This exists in the facility and accuracy with which they use it. Thirty or forty years ago, in general those who used Pinyin regularly with ease and accuracy were mainly language workers (teachers, linguists, etc.) and students in elementary school who were exposed to it every day for learning purposes. Now, of the many undergraduate and graduate students from mainland China whom I meet, almost all of them are highly adept at using Pinyin. That is, they can write it accurately and can read titles, names, etc. written in it without difficulty (no stumbling or halting). While they still do not have the occasion, model, or practice for reading integral Pinyin texts, many of them do have a fairly clear concept of how words and grammar should be written in Pinyin orthography. So I would say that they are slowly becoming accustomed to the notion that Pinyin can be used to represent speech, i.e., language, and not merely for transcribing and inputting characters.

    As I've been saying since the early 70s, the demands of modern science, technology, and commerce and the reality of IT and electronic communication will drive China increasingly in the direction of phoneticization / alphabetization, and that would be manifested primarily in one of two ways: 1. Hanyu Pinyin romanization of MSM, 2. English. When I made that observation back in the 70s and 80s, I honestly did not know and had no opinion about which would prevail: Pinyin MSM or English. I still don't know. What we can say for sure is that the role of English in China continues to expand, as does that of Pinyin MSM. But English is a real language (both spoken and written), and Chinese use it that way all the time. Pinyin MSM is not yet a real, full-blown written language, but it is slowly (snailish) acquiring some of the traits and usages of a functional writing system. Then, of course, there is the elephant in the room, MSM written in Chinese characters, which is still flourishing, albeit making various accommodations to the demands of IT and electronic communication.

    What we may have, then, for a considerable period of time and in varying degrees of mixture, is a complex combination of diglossia (MSM and English) and digraphia (Pinyin and English in Roman letters on the one hand and Chinese characters on the other hand). Here I'm speaking only of mainstream language trends. Naturally, there are all those Sinitic topolects out there, some of which are written to one degree or another and in one way or another, as well as all those non-Sinitic languages, many of which have their own writing systems. But the main contest is between MSM and English when it comes to spoken language and between Chinese characters and the Roman alphabet when it comes to writing. Linguistically, I'm sure that there will be plenty in China to keep us occupied and interested for decades to come. These are indeed exciting times for language specialists.

  27. JS said,

    May 24, 2016 @ 12:03 pm


    Thanks much; agreed.

    It's interesting that China seems to have to large degree made it over the mountains presented by the challenges of economic modernization, etc., with the "millstone" of the script intact. Given this state of affairs, and what you say about the "snail's pace," I find myself increasingly doubtful that the activation energy necessary to bring about substantive change in this regard will be generated in the foreseeable future.

    That is, digraphia as such has yet to emerge in the P.R.C. — if we take this to mean proper pinyin used alone on some limited but detectable scale, simply and straightforwardly, to represent extended stretches of vernacular Mandarin. That is why I think it is significant that pinyin is not used to teach children to read, as kana is in Japan: if it were, I think the position of the characters-only script would quickly become much more precarious.

    Thus, I'm very interested in seeing examples of this (proto-digraphic) kind that might already exist "out there." What sorts of widely visible extended pinyin-only text are you aware of, however limited in length or scope, that announces to its would-be readers "I am here to speak to you in Chinese!" rather than "I am here to accompany the associated Chinese character text"?

  28. liuyao said,

    May 24, 2016 @ 4:26 pm

    Thanks for the long comments that provide some of the background and "evidence" that were being called for. I found them more convincing and informative than original posts.

    It's somewhat ironic that pinyin is being used more to annotate obscure characters than full-blown vernaculars that a true phonetic script is at its best. When the need arises to write down a colloquial expression, one would first try to come up with the "correct" characters, then other characters that mimic the pronunciation, before resorting to pinyin.

    To add to your list: the younger generation routinely need to rename digital files in their computer, and they would sometimes resort to pinyin or pinyin initials, either for technical reasons (character in file names was not possible for a long time, and is still at risk of corruption when passed around) or practical ones (easy to type and sort).

    Speaking of sorting, I notice that iTunes still sorts songs or artist by some obscure scheme that involves radicals (of course their target users span the entire Sinosphere). I wonder if anyone has paid attention to how personal computer systems sort Chinese characters.

  29. liuyao said,

    May 24, 2016 @ 5:01 pm

    Coming from mathematics and physics, I'm a little puzzled by your item #11. The notations are all in Latin and Greek alphabets, same as the rest of the world, though I can't think of any instance that ostensibly stands for a pinyin word. (Li, the symbol for lithium, doesn't count as pinyin for 锂, right? Though it's very convenient.)

    If anything, the PRC seems to have pushed for translating and standardizing scientific terms into Chinese, except for eponymous terms which are left in their original script (maybe Anglicized; sometimes Russian names remain in Cyrillic).

  30. Jeff W said,

    May 24, 2016 @ 5:11 pm

    Thus, emotionally and functionally, Pinyin has become an integral part of Chinese life. What I don't really have a clear sense of is whether Pinyin is also developing script-like roles in the minds of a broad spectrum of Chinese individuals.

    That’s what I would want to know. But to me it seems like they’re using pinyin in some pretty specific circumstances, e.g., where the character script is really inefficient (ordering) or really can’t be used (i.e., no known character), or where the alphabetic script has some advantage, e.g., for non-native speakers (e.g., teaching Mandarin, road signs) or even for native speakers (e.g., advertising, which might reflect cachet), that the character script can’t address. So they’re developing more accuracy and more facility using pinyin and thinking of it as having a script-like role but that role might still be restricted to these types of circumstances (acknowledging that those circumstances are broadly distributed).

  31. Victor Mair said,

    May 24, 2016 @ 5:28 pm

    That's a good and fair summary, Jeff W.

  32. Jim Breen said,

    May 24, 2016 @ 6:06 pm

    Re: "I wonder if anyone has paid attention to how personal computer systems sort Chinese characters."

    Most computer systems these days use the Unicode character set. The CJK "ideographs" (as Unicode labels them) are unified into a single set. The default collating sequence is based on radical-plus-strokecount. See

  33. Victor Mair said,

    May 24, 2016 @ 6:39 pm

    …I can't think of any instance that ostensibly stands for a pinyin word.

    Who made that claim?

    I was talking about Pinyin / Latin letters, which are very much in evidence in Chinese science and technology publications.

  34. Jon Forrest said,

    May 24, 2016 @ 8:02 pm

    As I've mentioned, I like to talk to native Chinese speakers, who usually have no linguistic training, about some kind of script replacing characters. What usually results from such conversations is that the native speakers have a very hard time separating the concept of spoken Chinese from written Chinese. It's not hard to point out obvious examples of how Chinese can exist without writing, such as when people talk to each other in person or on the phone. Such discourse proceeds just fine without characters. But many native speakers feel that Chinese without writing simply isn't Chinese.

  35. liuyao said,

    May 24, 2016 @ 8:37 pm

    It was inferred from the other uses in your list, as well as the title of this post.

    Just want to be clear, you are talking about the x, y, and z in mathematical equations, and things like CO_2 and E=mc^2? They are all read as in English (as opposed to pinyin reading: a:, bo, ci, de, etc.), except the 2 and = sign. Interestingly I've heard more than one person pronouncing j as jie. Anyways, the most one could say is that knowing pinyin makes all those symbols very familiar even to those who don't speak any European languages (even though it's very hard to do any science without knowing/reading English).

  36. Victor Mair said,

    May 24, 2016 @ 8:54 pm

    @Jon Forrest

    Thanks for your worthy comment.

    Throughout history, there have been billions of Sinitic speakers who didn't know a single Chinese character or at most a mere handful of characters, yet they were able to converse volubly on all manner of subjects, and they could also tell very interesting stories and sing captivating songs!

    Even now, there are many speakers of Taiwanese, Shanghainese, Cantonese, Dungan, and other Sinitic topolects who do not correlate what they say with characters, and, as we have seen so many times on Language Log, there are often no characters for essential, favorite morphemes, words, phrases, and expressions in these spoken languages.

    All of this is proof that the characters are not necessary for spoken Sinitic languages, a corollary of which is that these languages can be written with phonetic scripts, just as Korean and Vietnamese, which used to be written in characters, are now written with phonetic scripts. Japanese likewise can be written solely in romanization (rōmaji) or kana, or with as many or as few kanji as one prefers.

    This is NOT intended as an argument for the replacement of characters by a phonetic script; it is merely a description of linguistic facts.

  37. Victor Mair said,

    May 24, 2016 @ 8:58 pm

    It was inferred from the other uses in your list, as well as the title of this post.

    Unwarranted, incorrect inferences.

    …knowing pinyin makes all those symbols very familiar even to those who don't speak any European languages….


  38. Victor Mair said,

    May 24, 2016 @ 9:34 pm

    Internal computer codes (Unicode) for characters may be sorted by radical-plus-strokecount, but that is not how computer users interact with / call up the characters. Nowadays that is customarily done via Pinyin (the same is generally true for looking up words in dictionaries). It is often the case that people cannot identify the radical of a character, and they also have trouble accurately counting the total / residual strokes.

  39. Jim Breen said,

    May 24, 2016 @ 10:46 pm

    The first kanji/hanzi computer character standard to be established was Japan's JIS C 6226 in 1978 (followed by the PRC's GB 2312 in 1981, Taiwan's Big 5, etc.). That JIS standard included about 6,300 kanji, plus kana, alphabetics (Roman, Greek, Russian), numerics, special symbols, etc. The kanji were divided into two groups, with the most common ~3,000 ordered by reading and the rest by radical+stroke-count. The ordering was done to help people find and enter characters, since input methods for Japanese text were in their infancy, and it was envisaged that people may have to look up tables of kanji and enter their numeric codes.

    When the "Han Unification" took place in the assembly of first Unicode standard, with all the national standards combined into a single set, the radical+stroke-count ordering was really inevitable, as it really the only measurable characteristic they had in common.

    Of course the character ordering is really for internal use only. As Victor says the human interface, especially for looking up characters and words in dictionaries, is done using pinyin, kana, etc. This disconnect between the internal ordering and the user view is probably at its most obvious with the CJK characters, but in fact it exists in some form in most languages. Words with è, é, ê, etc. have to ordered alongside their diacritic-free frères; Macintosh and McIntosh need to be adjacent in phone lists, etc. etc.

  40. Alastair said,

    May 25, 2016 @ 5:35 am

    How many of those uses of pinyin listed above usually come with tone marks?

  41. Victor Mair said,

    May 25, 2016 @ 7:31 am


    Good question!

    The use of tone marks varies. When people want to be very precise about the pronunciation of a particular character or for pedagogical purposes, the tones are usually included. But for more casual purposes, the tones are often not added.

    Somewhere (I can't find it now, but I thought it was on Language Log), I once described how, when my wife and I were publishing Xin Tang (New China), our journal of romanized Mandarin, we started out with tonal spelling (GR), but then moved to simplified GR (Lin Yutang's system) because few readers could cope with the complicated spelling rules for full-blown GR. Even simplified GR was not welcomed by our readers, so we went through experimental phases using other romanizations. Finally, because most of our readers were in China or were already familiar with Hanyu Pinyin, we switched to that and stuck with it for the duration, and we usually included tonal marks. Occasionally, however, we would omit the tonal marks, and we found that — so long as correct orthography (words properly separated, grammatical constructions indicated, etc.) were employed — native speakers and non-native speakers with near-native fluency would automatically add the tones (just as we add accents when we read English, even though they are not marked, and Russian readers make similar adjustments with texts that are normally not marked).

  42. Victor Mair said,

    May 25, 2016 @ 8:52 am

    Continuing the previous comment:

    I finally found the passage that I must have been thinking of (at least one iteration of it). It occurs on a Calvin College Linguistics website, and I have no idea how it got there. The post begins thus:

    Thursday, November 10, 2005
    ‘On Language’ 11/9/05: A crisis in China isn’t an ‘opportunity’

    Debunking misconceptions about Chinese characters
    ‘On Language‘
    Chicago Tribune
    November 9, 2005
    By Nathan Bierma

    The post goes through all sorts of random topics related to Chinese language and script, including an exchange between a Peking University professor and me written entirely in Pinyin without tones, but with careful attention to orthography. Then comes this passage, replying to someone named David:


    Not really, David.

    When native speakers and foreigners who are advanced in Mandarin (beyond about the 4th-year level) read such texts aloud, they automatically add the appropriate tones. During the past twenty-five years, I’ve done a lot of experimentation with romanized texts using various types of tonal indication (GR, simplified GR, numbers, diacriticals) and no tones at all. Shin Tarng / Xin Tang went through all of those stages, and we found that the tone marks simply were not necessary (and even got in the way) for people having native fluency. It’s similar to the marking of stress for children in Russian textbooks, but it’s very seldom done for adults. Ditto for accent marks in English. Of course, the marking of tones—by whatever means—is very important during the early stages of studying Mandarin and other tonal languages. GR is an elegant, easily typable means of indicating tones. Unfortunately, **for whatever reason(s)**, pedagogues—even at Princeton!—have largely abandoned it.

    After a discussion of Unicode, the above passage is repeated, but with this added:


    Here’s the abstract of a 1997 study of GR vs. Hanyu Pinyin in terms of the teaching of Mandarin Chinese pronunciation:

    “This study presents results from a 2-year investigation of the comparative efficacy of tonal spelling and diacritics in the teaching of Mandarin Chinese pronunciation. The research site was the elementary level Chinese language course at the University of Oregon. During the 1991-92 academic year, the course was taught using a romanization system with diacritics, hanyu pinyin (PY); during the 1992-93 academic year, the course was taught using a tonal spelling system, gwoyeu romatzyh (GR). The analytical mechanism of this study calculates student tonal error rates in identical (save for the romanization system used) reading tasks at identical points in each year’s course. Native speakers of Chinese served as assessors. ***The results clearly indicated that GR did not lead to significantly greater accuracy in tonal production. Indeed, the use of GR reflected slightly lower rates of tonal production accuracy for native speakers of both American English and Japanese.***”

    (My emphasis.)

    The complete article is here:



    I suppose that the "Mark" who added the latter section is Mark Swofford. If he is listening in, perhaps he can tell us where these things originally appeared, and maybe even how they got on the Calvin College blog.

  43. Hong Zhang said,

    May 25, 2016 @ 10:42 am

    Smartphones did help with the popularity of pinyin. It's not until the emergence of smartphones that people in China switched to pinyin in writing text messages. When cellphones only had limited keys on the keyboard, the fastest and easiest way to type Chinese characters for most people was to type by some rules of writing strokes of the characters out. I think even nowadays writing characters out on keyboard is still faster than typing by pinyin, if you've mastered the rules and the correspondence on the keyboard.

  44. liuyao said,

    May 25, 2016 @ 1:01 pm

    Cellphones do play a major role, but it started with "dumb" phones in the first decade of the present century. Pinyin was used on the number pad, and many who never had to use pinyin in their adult life suddenly found themselves having to use it daily. My father got so good with typing on number pad that he insisted to have it (called 九宫格) on his smartphone (though now he uses handwriting exclusively from what I observe; he never got too fast with QWERTY board). That could be a whole discussion by itself.

    Sorting is more imporrtant on cellphones than on PCs, and there it's uniformly pinyin (in the mainland).

  45. APOLLO WU said,

    May 25, 2016 @ 9:20 pm

    I sympathize with the frustration experienced by Julie lee on the lack of alphabetic order on book shelves in Chinese bookstores. I did make an effort to convince some bookstore owners in Hong Kong to use Pinyin order for putting books on the shelves. In case the customers don't know Pinyin, they only need one employee at the bookstore to provide immediate help. As time goes on, more and more customers would be able to find what they need by themselves. The lack of order appears everywhere in Chinese endeavours. All I can say is that, the cultural imprint of confusion and messiness as a result of the continuous use of the Chinese characters has long become the hallmark of Chinese society.

  46. julie lee said,

    May 25, 2016 @ 11:05 pm

    Thank you , Apollo Wu, for this bold and blunt statement.

  47. Matt Anderson said,

    May 26, 2016 @ 10:00 am

    liuyao (in regards to one of your comments from a couple of days ago)—

    The default sort order in itunes (by "radical" & stroke, on my computer at least) really is difficult to use. I don't prefer this kind of ordering in any situation, but I can cope with it in a dictionary or other reference work; somehow it's much more difficult for me to process in itunes or in a list of computer files.

    I'm not sure about other operating systems, but on OS X, at least, you can change everything (not just itunes, but also finder, etc.) to alphabetical order by pinyin (though this inevitably still gets confusing on occasion, as when the os picks the wrong reading for a graph with multiple readings and when it lists Japanese words under their characters' Chinese readings).

    As long as you have at least one kind of Chinese as one of your "Preferred languages", the List Sort Order/列表排序 option is under the Language & Reading/語言與地區 section of System Preferences/系統偏好設定, and it includes a few other options, too, including purely by stroke (this would make it pretty difficult to find things, I think) and zhuyin, among others. Unfortunately, I can't figure out how to change this setting on my iphone.

  48. Jacob said,

    May 26, 2016 @ 11:47 am

    Kevin's comment and the discussion on Rozin's articles bring to mind some of the humorous (with varying degrees of success) definitions or etymologies of English words, e.g.:

    经济 economy 依靠农民
    海关 custom 卡死他们
    地主 landlord 懒得劳动
    救护车 ambulance 俺不能死
    律师 lawyer 捞呀
    怀孕 pregnant 扑来个男的

    And it took way more time and effort than I would have liked to convince a former boss that the etymology of "casino" was not Chinese immigrants shouting "开始啰" to start some backroom gambling.

  49. Jacob said,

    May 26, 2016 @ 11:49 am

    I almost forgot the "learn English with Chinese characters" trope, e.g.:


    etc. etc.

  50. julie lee said,

    May 26, 2016 @ 12:44 pm


    I enjoyed that. San-ke-you (三克油)。

  51. liuyao said,

    May 26, 2016 @ 5:00 pm

    Thanks, Matt Anderson! I made the switch on my Mac. I guess I had bigger problem with English sorting (by Last Name) and never bothered to look for a solution all these years.

    FYI, the options (with English and both Simplified and Traditional Chinese as "Preferred languages") are:

    – Universal
    – Chinese (Pinyin Sort Order)
    – Chinese (Radical-Stroke Sort Order)
    – Chinese (Simplified Chinese Sort Order – GB2312)
    – Chinese (Stroke Sort Order)
    – Chinese (Traditional Chinese Sort Order – Big5)
    – Chinese (Zhuyin Sort Order)

  52. Victor Mair said,

    May 26, 2016 @ 8:44 pm

    I guess I had bigger problem with English sorting (by Last Name) and never bothered to look for a solution all these years.

    Please explain what you mean by this.

  53. liuyao said,

    May 26, 2016 @ 11:26 pm

    Sorry about the confusing remark, which has more to do with my personal preference in music. By default, iTunes sorts artists and composers alphabetically as entered (omitting initial "the" from band names), so if one wants to sort by last name, which I prefer because I mostly listen to Classical music, one would need to edit everything into "Last Name, First Name" format, or specify a Sort As name that iTunes provides to override the name that appears up front. I still prefer seeing my composer column with last names all lined up just like in a name index.

    What I meant was that I've probably spent more time editing composers of Classical music than I have looking for Chinese songs under radical sort. No question that alphabetical sort is superior, just to be clear.

  54. JS said,

    May 27, 2016 @ 11:11 am

    As a lover of pinyin-as-script, I was intrigued by this (mildly insensitive) ad from a year ago, pointed out by a friend — reminded me of hours spent adding pinyin pages to Chinese children's books a few years back. My view is that this is qualitatively different from the uses of pinyin currently seen in the P.R.C. Who knows what the future will bring, though…

RSS feed for comments on this post