Chinese character inputting

« previous post | next post »

During my "Language, Script, and Society in China" class on this past Thursday (10/15/15), I asked the students the following questions:

1. What is your primary method for inputting Chinese characters?

2. What percentage of the time do you use your primary method for inputting Chinese characters?

3. What is your secondary method for inputting Chinese characters?

4. What percentage of the time do you use your secondary method for inputting Chinese characters?

The reason I asked for primary and secondary methods is because occasionally one will have to deal with a rare character whose shape, meaning, and / or sound one is not familiar with, so one may have to resort to a different method to enter it into the text one is typing.

Before revealing the results of the survey, I need to say something about the composition of the class.  There were around 20 students in attendance that day.  All of them are familiar with Chinese characters.  One student is studying Korean, so she only knows a few characters and did not participate in the survey.  About half the students in the class are M.A. and Ph.D. candidates from the PRC, so they are fully literate in Chinese.  All of the other students, who are from America or other countries, are advanced in the study of Chinese, so they regularly write Chinese for various purposes.

The results pointed overwhelmingly in one direction:  every single student in the class uses pinyin romanization as their primary inputting method, and nearly all of them said that they do so between 95% and 100% of the time.  Many of the students didn't even mention a secondary inputting method.  Of those who did mention a secondary inputting method, the only one they listed was handwriting on the touch screen / pad of their iPhone, iPad, android, etc. or with a mouse on their computer.  No one mentioned such shape-based systems as Cangjie and Wubi, not even as a secondary method for inputting.

No, beg your pardon; one other very different secondary method for inputting rare characters whose pronunciation and / or meaning are unknown was noted, viz., cutting and pasting from a pre-existing document or data base.  One student said that he uses this method, and I have met other individuals who do so when confronting characters with which they are unfamiliar (I myself do it from time to time).

This overwhelming, virtually unanimous, preference for pinyin inputting is an interesting, but to me not at all surprising, development.  Five years ago, when I did a similar survey in the same course, a few students did mention Cangjie and comparable methods, and one mentioned Wubi for auxiliary purposes.  I have always said that — if students were not forced to learn Wubi [and some high schools did require it] — no one except professional, full-tme typists would struggle to master it.  See Rebecca Shuang Fu's revealing paper on this subject in Sino-Platonic Papers 224.

With every passing year of people texting on cell phones and composing on computers via pinyin, and with the simultaneous improvement of automatic conversion of running pinyin text to Chinese characters (it is astonishing how good these systems have become), the percentage of those who use romanization for character inputting approaches one hundred percent.

The following are some earlier Language Log posts that are relevant to today's topic:

"Stroke order inputting" (10/30/11)

"Cantonese input methods" (1/20/15)

"Google Translate Chinese inputting" (1/27/13)

"Creeping Romanization in Chinese" (8/30/12)

"Chinese Typewriter" (6/30/09)

"Chinese typewriter, part 2" (4/17/11)

"Zhou Youguang, Father of Pinyin" (1/14/14)

"Zhou Youguang, 109 and going strong " (1/13/15)

My interest in the computerization of Chinese characters goes back to a conference I held at Penn in 1990, and beyond that to the 80s and 70s, when the very idea of inputting characters in computers was daunting.  For a summary of the early history of characters in computers, see Victor H. Mair and Yongquan Liu, eds., Characters and Computers (Amsterdam, Oxford, Washington, Tokyo:  IOS, 1991), which is based on the 1990 Penn conference.  From the very beginning, I have always maintained that the only efficient, user-friendly system for inputting Chinese characters for the bulk of the population would be phonetic (especially alphabetical).  It is gratifying, after witnessing the invention of hundreds, if not thousands, of inputting methods for Chinese characters, to see that pinyin is indeed turning out to be the choice of the vast majority of those who enter texts into electronic devices such as cell phones and computers.


  1. Rubrick said,

    October 17, 2015 @ 2:50 pm

    I'm a bit surprised no one mentioned voice dictation as a secondary method. I'm guessing voice recognition for tonal languages is tricky (too lazy to research that right now), but I'd imagine it's still good enough to be useful at times (say when driving).

  2. Michael Watts said,

    October 17, 2015 @ 7:20 pm

    I had a chinese tutor who had come to the US from guangzhou at around the age of 18; she always used wubi (on a computer). For her it relieved the problem of thinking of characters in their cantonese reading. I regularly correspond with a woman in shanghai who says she uses 笔画 (where you input a sequence of strokes in order – the possibilities are 横, 竖, 撇, 点, and a catchall hooky stroke) (this is on a phone).

    I use pinyin, but I use 笔画 secondarily (on my phone) when I can't get the character I want through pinyin input. I've observed actual Chinese people to have the same problem, but they usually type in a phrase (using pinyin) that includes the character they want, select that, and then delete the extra characters.

  3. Derek said,

    October 17, 2015 @ 8:38 pm

    I don't use either, but no zhuyin or 速成 at all? That students from the PRC or America/other countries all used pinyin is no surprise to me, since I have yet to meet a single person my age from the PRC who actually knows Wubi, and character input methods aren't exactly standard curriculum for Chinese classes. But I've met young Hong Kongers who use 速成 and were required to learn 倉頡 in high school, and in Taiwan zhuyin is still predominant as far as I can tell. I use pinyin on the computer but use 倉頡 as an auxiliary input method for character lookup.

  4. K Chang said,

    October 18, 2015 @ 3:08 pm

    I use pinyin as well, and I was from Taiwan. I left before computers became popular and my handwriting had deteriorated to the point where I can barely write my name. :-( however, I do recognize characters and I have no problem reading Chinese. I often have to fight pinyin so often I ended up using Google translate to find the proper pinyin for a character I need.

  5. liuyao said,

    October 18, 2015 @ 4:07 pm

    Not surprising. If one really needs to get esoteric characters, none of the systems will be good enough. I'd go to zdic or (by radicals + strokes), and in worst cases the character is only in image form. That gets me thinking, why do we need to have a complete set of characters stored in each computer and smartphone? As we are always "online", why can't it send a query to a database "in the cloud" for esoteric characters to insert in the text? This will be very valuable to specialists in oracle-bone and bronze inscriptions (and other early forms of writings, not just Chinese), and to those who want to write in the style of famous calligraphers.

  6. Jenny Chu said,

    October 19, 2015 @ 12:44 am

    My kids (10 and 12) are currently being educated in a local school in Hong Kong where they speak Cantonese most of the school day, but where Putonghua is used as the medium of instruction for Chinese class. Although they have a total command of numerous types of input methods, I've never once seen them use pinyin – neither for Chinese class nor for any other class where written Chinese is required. At most, they might use the "Cantonese" input in Google Translate if they have no idea how something is written. Should I conclude that character writing is being slowly abandoned in [mainland] China, but preserved in the Special Administrative Region(s)?

  7. Eidolon said,

    October 19, 2015 @ 11:18 am

    I'd actually expect phonetic input via pinyin to not be the most efficient way of inputting Chinese characters for kids whose native languages aren't Mandarin, as pinyin was designed for Mandarin and does not necessarily match the internal phonetic vocabulary of Cantonese, Teochew, etc. speakers. To this end, one of the side effects of the pinyin monopoly in China might be further Mandarinization of the population, as Cantonese, Teochew, etc. speakers have to learn Mandarin pinyin to use popular electronic input devices.

  8. Nicholas Feinberg said,

    October 19, 2015 @ 3:31 pm

    @lluyao: The costs of storing a 'complete set of characters' on each device is very low, and going to 'the cloud' presents a whole host of reliability and functionality issues: what happens when you don't have a reliable internet connection, or any at all? What if you're on the wrong side of the Great Firewall? What if the servers hosting your characters go down? (As they inevitably will.)

    In general, things like imitating the style of 'famous calligraphers' is handled with fonts; oracle bone script doesn't currently have a representation in unicode, but there's a place reserved for it. ( )

  9. liuyao said,

    October 22, 2015 @ 9:50 pm

    @Nicholas, I agree. I'm very ignorant in this matter, of course. However, if you have poor internet connection, then any text would have trouble getting through. I've only had chance to read a few papers (in pdf) on bronze inscriptions and what they do is to include an image of the character in the middle of the text. I don't know if there's any need to make it work in other more fluid formats such as a blog comment :)

    I remember that Microsoft Word used to be able to compose nonexisting characters from their components, but since I never used it I can't say more about how it works. I believe I have seen some results of it, and they clearly stood out from the rest of the text.

RSS feed for comments on this post