Easy versus exact

« previous post | next post »

Ever since people started inputting Chinese characters in computers, I've had an intense interest in how they do it, which systems are more efficient, and why they choose the particular ones they adopt.  For the first few decades, because all inputting systems presented significant obstacles and challenges, I remained pretty much of an onlooker because I didn't want to waste my time struggling with cumbersome methods.  It's only after I discovered how simple and fast it is to use Google Translate as my chief inputting method that I became very active in entering Chinese character texts.

Because of the above considerations, during the last three to four decades, I have developed the habit of closely and carefully scrutinizing friends, colleagues, students, and others as they enter Chinese characters in their computers, cell phones, tablets, and other digital tools.  I have written about my observations in many Language Log posts, including the following:

"Chinese character inputting" (10/17/15)

"Stroke order inputting" (10/30/11)

"Cantonese input methods" (1/20/15)

"Google Translate Chinese inputting" (1/27/13)

"Creeping Romanization in Chinese" (8/30/12)

"Chinese Typewriter" (6/30/09)

"Chinese typewriter, part 2" (4/17/11)

"Zhou Youguang, Father of Pinyin" (1/14/14)

"Zhou Youguang, 109 and going strong " (1/13/15)

"Swype and Voice Recognition for mobile device inputting" (1/22/14) — esp. ¶¶ 3-5

"Language notes from Macao and Hong Kong" (6/22/14) — search for "Starbucks"

Usually I just watched what people did as they entered characters and drew my own conclusions from what I saw, not wanting to interrupt their typing.  Lately, however, as in the last post in the above list, I've had more opportunities to ask people how they choose from among the many inputting methods that are available to them.  The answers I've been receiving are quite revealing.

I shan't go through all possible methods, but will focus only on the two most popular means for inputting characters.  By far the most common method for inputting Chinese characters — especially for people who are around forty or younger — is Hanyu Pinyin.  The next most common method — particularly for those who are over forty or so — is to write the characters with the tip of one's finger on a glass touch screen or pad.  In several of the above posts, I have described the frantic flailing one witnesses when people input Chinese characters this way.

From my earlier observations, I noticed that people who entered Chinese characters via the tip of their index finger (less often with a stylus) frequently seemed frustrated and aborted the effort to produce a desired character because what they wanted was not showing up in the list of characters displayed.  Some would try again and again till they got what they wanted, or they would shift to Pinyin to call up the character they were after.

Recently, I have asked some of the people who were switching back and forth between writing the characters with their fingertip and typing them via Pinyin why they didn't just use Pinyin all of the time if they often had to resort to it anyway.  The usual answer was that they would start out writing with their fingertip on the glass screen or pad of their electronic device because, especially for very simple and common characters like nǐ 你 ("you") and hǎo 好 ("good"), because they felt it was the path of least resistance, but would switch to Pinyin when they were frustrated at calling up more complex and difficult characters such as lài 癞 / 癩 ("scabies") and pēntì 喷嚏 / 噴嚏 ("sneeze").

As I watched some of these individuals inputting a variety of characters and being stymied when their software proved incapable of quickly retrieving recalcitrant characters, I asked them precisely why they would change over to Pinyin.  The answer was that the fingertip writing offered too many possibilities for them to have to choose from (and many times none of the proffered characters was the one they were after), whereas when they switched over to Pinyin and typed by words in context, the choices presented by the software were much fewer, and, in many cases, were narrowed down to precisely the exact combinations they were after.

I wish to emphasize that the majority of people who are inputting Chinese text do use Pinyin exclusively or nearly so for inputting characters, and they do so because it is faster, more convenient, more accurate, and more efficient than other methods, and above all it does not require them to learn any special codes, mnemonics, or non-intuitive techniques for decomposing the characters.


  1. Andreas Johansson said,

    October 14, 2017 @ 2:06 pm

    So the Chinese script is, bizarrely, becoming one where the dominant way of writing it is writing in another script entirely.

    (I guess the logical next step is reading hanzi with the help of little pinyin translations, furigana style?)

  2. John Rohsenow said,

    October 14, 2017 @ 3:43 pm

    Being (surprise, surprise) a HYPY input person, I know nothing about the finger tracing method, but I am surprised that the programs are not "smart" enough to "learn" or "remember" which character from the list
    the owner of the finger chose and make that the preferred choice after
    a while. When my aging father could no longer type, he got an input program by which he could dictate orally and the words came up on the screen with a fair amount of accuracy. As this was twenty some years ago, he first had to read a list of 200? some words into the program, presumably so that it could recognize his pronunciation. I am not suggesting this oral input program for Mandarin, primarily b/c of the so many regional pronunciations, but rather by analogy, couldn't
    the program either "learn" the writer's individual "running" style as I
    outlined above, or else have a list of the most common characters like
    the program my father had, and first write them in so it could learn
    that particular inputter's style for each character?
    Of course, the simplest solution is simply just to start off with HYPY
    input in the first place, which – as Victor is suggesting – is probably what will inevitably happen. AND I suspect that people who have no,e.g.
    zh-,ch-,sh- vs. z-,c-s- distinction in their speech will just arbitrarily learn o make the distinction in their "spelling" just as English typists have to deal with the arbitrary oddities of English spellings which do not match their pronunciations.

  3. J said,

    October 14, 2017 @ 4:38 pm

    I believe I’ve never needed to write ‘scabies’ in any language until now….

  4. David Morris said,

    October 14, 2017 @ 5:14 pm

    My first guess was that I have never written 'scabies'. A search of my files shows one occurrence – in a quotation about health conditions in the early colonial period of Australia.

  5. Y said,

    October 14, 2017 @ 7:47 pm

    Do speakers of non-Mandarin topolects who don't speak much Standard Chinese tend to rely less on Pinyin?

  6. Tom davidson said,

    October 14, 2017 @ 9:50 pm

    Am looking for a good software program that shows PY with proper diacritics. Suggestions welcome!

  7. B.Ma said,

    October 15, 2017 @ 3:16 am

    I use pinyin even when typing Cantonese, it's just a lot easier for me since the order I learned things in was spoken Cantonese > Mandarin via pinyin > "Chinese" characters (i.e. written Mandarin) > Cantonese characters. For the characters that I can't type or I don't know how to write at all, I just spell them using English orthography.

    I tried out CPIME, which is good mainly because there are fewer choices given the wider range of possible syllables in Cantonese, but I can't get my head around Cantonese romanizations, mainly because there are lots of vowels and the Latin alphabet is not really a good match (I wish there was a zhuyin equivalent for Cantonese)

    When I come across a character I don't know then I handwrite it to find out the pinyin via wiktionary/zdic etc. Once I know it then it will be typed in pinyin only.

  8. J K said,

    October 15, 2017 @ 8:32 am

    It would be interesting to see if there are any data on the usage of voice messages like those sent on WeChat vs. textual inputs. Is it possible that WeChat was so popular because it offered the voice messaging service, thereby eliminating the issues surrounding inputting Chinese characters? It seems to me like even though most smartphones in English-speaking countries also offer such voice messaging now, texting is still more popular (but I don't have the data to back that up).

  9. Victor Mair said,

    October 15, 2017 @ 8:50 am

    B. Ma's statement is important for several reasons. First of all, it shows how useful pinyin is even for typing Cantonese. At the same time, it reveals the need for a workable Cantonese pronunciation. Jyutping, while accepted and used by linguists who are thoroughly familiar with Cantonese phonology, has not caught on among the broader base of Cantonese speakers. Ma mentions the difficulty of there being a plethora of vowels, but I think another problem is that of tones, since Cantonese speakers themselves are often not clear about just how many tones there are and how to identify them with numbers or diacriticals.

    I know hundreds of Cantonese speakers, but I only know about a dozen who are comfortable with Jyutping. Most of my Cantonese friends and students have almost no clue about how to write their Mother Tongue in any kind of phonetic transcription. They can't even come up with ad hoc Roman letter transcriptions. I think this is a huge tragedy and one that those who are concerned about the future of Cantonese should consider seriously and take steps to remedy. If Cantonese speakers want to preserve their language and make it more widely accessible to non-native speakers, they need to come up with a workable Romanization, one that is taught in all schools where Cantonese is the mother tongue of the students and also to foreigners who want to learn Cantonese.

    As I remarked in a couple of earlier posts, I have met a number of native speakers of Cantonese who use English to input characters. They think of the equivalent English word for the Chinese term they want to type, enter the English word into their electronic device, and then select the character(s) they have in mind from among the translations offered by their software. Such a clumsy, roundabout way of typing Cantonese or Zhongwen / Chinese / Mandarin, but people actually do go to such lengths.

  10. Victor Mair said,

    October 15, 2017 @ 8:55 am

    @J K

    Bingo! Right on!

    The same point about relative avoidance of typing among the WeChat crowd versus preference for texting among English smartphone users has been made in earlier conversations on this topic.

  11. B.Ma said,

    October 15, 2017 @ 1:05 pm


    "I think another problem is that of tones"

    Well, not really with regards to typing, because tones are not required (which is also the case for pinyin typing). I agree that the tones can be a problem generally. Even my grandfather, who was involved in education under the KMT on the mainland was not sure about some Cantonese tones, and there are probably dialectal differences.

    The "using English to type Chinese" thing also amuses me.

    With regards to J K's comment, I don't know anyone who uses WeChat as my family / friends are all in HK or western countries, and we use WhatsApp which also offers voice messaging. But I can say that those people who are less comfortable in English tend to use voice messaging or handwriting (since even typing in pinyin requires knowing where the letters are on the keyboard).

  12. Michael Watts said,

    October 15, 2017 @ 2:36 pm

    As to number of vowels, English is very rich in vowels, with somewhere in the high teens or low twenties, and makes do with 5 or 6 vowel characters; I don't see why the Latin alphabet would be less suited to the number of vowels in Cantonese than it already is to the number of vowels in English?

  13. Michael Watts said,

    October 15, 2017 @ 2:41 pm

    For example, out of the 24 Wells lexical sets, I personally distinguish 18 of them: TRAP, LOT, KIT, FLEECE, DRESS, STRUT, FOOT, GOOSE, FACE, PRICE, CHOICE, GOAT, MOUTH, NURSE, START, FORCE, NEAR, SQUARE.

  14. Tom davidson said,

    October 15, 2017 @ 5:01 pm

    Please recommend an HYPY system that gives the pinyin along with the diacritic above the vowel. Thanks!

  15. Michael Watts said,

    October 15, 2017 @ 9:19 pm

    Tom davidson, a pinyin input system isn't meant for displaying pinyin. It's meant for using pinyin as an input method to select 汉子 for display.

    The problem of displaying characters has been largely solved by various unicode implementations, but you'll only see the characters that someone wrote. I use Pleco, which is a cell phone application, for my Chinese dictionary needs; it displays pinyin as a pronunciation guide in the manner you're looking for.

    If that's not what you were talking about, please be clearer.

  16. Michael Watts said,

    October 15, 2017 @ 9:21 pm

    Correction: characters are 汉字 ("characters"), not 汉子 ("guys", as far as I can tell).

  17. kwf3 said,

    October 15, 2017 @ 10:12 pm

    @Y: This doesn't answer your question, but years ago I noticed an input method that was very tolerant of, for example, mixing up -ng and -n, which I guess would find the approval of, to stick with the example, users of Shanghainese-coloured Mandarin. I guess you can implement any approach to text input – pronunciation-based, character-based, or other – user friendlily or user unfriendlily, as illustrated by John Rohsenow above.

    @J K: Recently someone commented on the rarity of seeing a text message in Chinese since everyone uses Wechat, but I didn't ask if they meant everyone uses voice messages, or that they use Wechat text messages as opposed to regular (pre-smartphone) texting, if that makes any sense.

    @Michael Watts
    >somewhere in the high teens or low twenties
    >very rich in vowels
    Why, you are lucky to have that FEW! In MY days, we used to DREAM of having to put up with only 24 vowels, even though my old Dad used to say a small vowel inventory doesn't buy you happiness.

  18. kwf3 said,

    October 15, 2017 @ 11:04 pm

    I think Tom davidson is asking about a system where you type without diacritics and get candidate lists where HYPY with diacritics is displayed next to each character / phrase in the list. This would be useful for people who are unsure about a word's characters (because traditional / simplified / rare character / proper name / character illiterate user) or its MSM tones (because non-native user / rare phrase / proper name). If the software is very smart and implemented very carefully, it might even allow people to look up or casually familiarize themselves with HYPY orthography; you would type "hanfu" and get not only a list of (character) candidates with tones (via accompanying HYPY), but also see whether the first letter is uppercase or lowercase in HYPY, and whether there is a space between the syllables.

  19. Victor Mair said,

    October 15, 2017 @ 11:41 pm

    hànzì 汉字

    Chinese character; sinograph


    hànzi 汉子

    1. man; fellow

    2. husband

    3. Historically, during the Northern Dynasties (386-577), hànzi 汉子 was a derogatory reference for Sinitic persons used by non-Sinitic peoples (who were rulers in the north at that time). Few people were aware of this in modern times till Sanping Chen pointed it out.


  20. Michael Watts said,

    October 16, 2017 @ 12:20 am

    Anything to say about the seemingly odd phrase 女汉子? I found it in a Chinese webcomic, asked my tutor at the time (who had immigrated to the US at a young age) about it, and was told "there's no such phrase".

  21. Michael Watts said,

    October 16, 2017 @ 12:31 am

    kwf3, what's your idea of a "high" number of vowels for a language? Wikipedia shows a paper from 1999 concluding that Cantonese has 20 vowels. That is more than a standard English inventory, but not a lot more.

  22. John Swindle said,

    October 16, 2017 @ 1:41 am

    I don't know whether this was what Tom davidson was asking, but how do you type Hanyu Pinyin text and get Hanyu Pinyin output, without copying and pasting?

  23. ohwilleke said,

    October 16, 2017 @ 5:03 am

    @Michael Watts

    It scares me that it is even possible for someone like me who knows nothing about Chinese and couldn't either draw or pronounce the word to find multiple thoughtful answers attacking different perspectives of the question as you can here:


  24. Guy_H said,

    October 16, 2017 @ 5:34 am

    Purely anecdotal but I'd estimate I input using pinyin 95% of the time, and only switch to bopomofo or handwriting input when I get the pronunciation "wrong". For some reason, my pinyin keyboard only recognizes mainland pronunciation, so if a character has a different pronunciation in Taiwan, it won't show up (and I have a HTC phone too!)

    I'd also add that touchscreen technology has really democratized internet access and social media usage for my parent's generation (they are both 60+). They are comfortable with neither pinyin or bomopofo, and generally shy away from computers, but being able to input Chinese by hand on a smartphone opened up a whole new world for them. They can now happily surf the internet, send emails and text away at friends and family. I was astonished watching my mother casually cast a Youtube video to her TV screen with the flick of a finger!

  25. Jonathan Smith said,

    October 16, 2017 @ 10:21 am

    @Michael Watts I feel like I have only heard 女汉子 (take-charge/go-get-em thus "man-like" woman) within the past 5ish years but others' mileage may vary. Some parallels to earlier (?) 女強人 but the latter type arguably more feared by misogynists/men?

  26. R. Fenwick said,

    October 17, 2017 @ 1:55 am

    @Michael Watts:

    kwf3, what's your idea of a "high" number of vowels for a language? Wikipedia shows a paper from 1999 concluding that Cantonese has 20 vowels. That is more than a standard English inventory, but not a lot more.

    @Michael Watts:

    kwf3, what's your idea of a "high" number of vowels for a language? Wikipedia shows a paper from 1999 concluding that Cantonese has 20 vowels. That is more than a standard English inventory, but not a lot more.

    Define "standard". My Australian English, which is pretty close to standard for that variety, has 20 phonemic vowels (seven short, eight long, five diphthongs). The fact that English is widespread doesn't mean that it can't also be typologically unusual.

    In any case, as languages go 20 vowel phonemes is very high indeed. The most common system among the world's languages is the five-vowel system /i e a o u/, and there are plenty of languages with less (Malagasy has four, Quechua three, Abaza two). Even if one only counts vowel qualities (i.e. vowels specified for frontness, height and rounding), WALS gives the average at a touch under six, so if Cantonese exceeds that then it's above average. (Australian English has at least eleven distinct vowel qualities [i ɪ ɛ æ ɐ ɜ ə ɔ o ʊ ʉ], also very high among languages of the world, though not as high as German, Swedish, or Danish.)


  27. dainichi said,

    October 17, 2017 @ 5:13 am

    Personally I come to MSM from Japanese, so I know lots of hanzi whose pronunciation I don’t know or can’t remember. I also know some MSM words whose hanzi I can’t remember. The written and spoken language not connecting in my brain is one of the most annoying things about MSM.

    So for me, an optimal input system would be forgiving in its input, but reinforce the connection between the hanzi and the pronunciation. I think this is also what Tom davidson hints at.

    For example, the system could be able to switch easily between a “stroke mode” and a “pinyin mode”. In “stroke mode”, as soon as it knew what character(s) I wanted, it would also remind me of their pinyin including tone. Likewise, in “pinyin mode”, as soon as it knows what I want, it would remind me of the pinyin (including tone) of the whole thing, not just what I inputted (assuming it has prediction).

  28. kwf3 said,

    October 17, 2017 @ 8:53 pm

    @Michael Watts
    You are of course right, Englishes do have many vowels.
    I may or may not have more. I count 27 (or even 34) superficially distinct vowels (not vowel phonemes!):
    13 to 15 phonemes if counted the WALS way (linked by R. Fenwick; I'm unsure about the validity of my minimal pairs for two phoneme candidates) + 12 (or several more) non-phonemic diphthongs + (if you feel generous) 7 non-phonemic short–long distinctions.

  29. liuyao said,

    October 17, 2017 @ 10:07 pm

    On the newer Windows system, the built-in Microsoft pinyin is easy to switch to strokes mode and component mode simply by typing "u" at the start. Input by strokes is also influenced by pinyin, as the five strokes (horizontal, vertical, the two diagonals, and "bent") are h, s, p, n, z, respectively. Very easy to remember for Mandarin speaker. There also are other features that make inputting today better than pure pinyin. (The success is to a large degree thanks to Mandarin standardization and primary school education in the Mainland, often at the expense of local topolects.)

  30. julie lee said,

    October 18, 2017 @ 11:27 am

    Like @Jonathan Smith, I too saw the similarity of 女汉子 (nu hanzi , literally "female man"} to the earlier 女強人 (nu qiangren, literally "female strongman") , with the difference that nu hanzi comes from a Chinese phrase hanzi ("son of Han" , i.e., "man") and
    nu qiangren comes from the calque qiangren, from English "strongman" (meaning "a political leader who rules by force"), another example of the great influence of English on Chinese.

    Today, reading a summary in English of Xi Jinping's marathon speech to the Chinese Party Congress, I come to the phrase "carrot and stick" . I wonder if Xi's Chinese for this phrase was a literal translation of the English.

  31. Dave Cragin said,

    October 18, 2017 @ 9:01 pm

    @Tom Davidson – Pinyinjoe's site has multiple options for typing pinyin in Microsoft Word. You may be looking for the macros he offers: https://www.pinyinjoe.com/pinyin/pinyin_macro.htm

    A superb teaching tool that is integral to Word is that Word can display the pinyin directly above characters. To enable this, you need to install a Microsoft update (free) called: MSPY 2010 update Joe has it at: https://www.pinyinjoe.com/faq/mspy2010-pinyin-ime-update.htm

    This update also makes the software far more intelligent when typing characters. That is, it's much more likely to pick the right characters in context.

    I prefer Word over google for looking at the pinyin associated with each character because Word displays the pinyin directly above the character. Also, the text remains entirely in Chinese (i.e., both pinyin & characters). So if I'm trying to read a text for practice and get stymied by the characters, I then can try to read it in pinyin (without having English text on the page that would give the translation).

RSS feed for comments on this post