Google Translate Chinese inputting

« previous post | next post »

Google Translate is so incredibly good — especially for typing Chinese and producing Pinyin (Romanization) with tones — that I rely on it a lot and am always afraid that, like so many software developers (e.g., Microsoft), they are going to add some unwanted bells and whistles or take away some basic features.  So today, when I turned on my Google Translate and saw a new wrinkle in the bottom left corner of the box into which you input Chinese, I was worried that it would lose the features that make it so easy for me to enter text.

What I saw was a little button marked just pīn 拼.  Of course, I knew that must refer to Pinyin Romanization producing simplified characters, but I was concerned that the system wouldn't work the same way it used to.  Then I clicked on the button and it dropped down choices for Wubi (PRC shape-based inputting system), Bopomofo (Taiwan Phonetic Symbols), and Pinyin producing traditional characters.

I thought to myself that the Wubi and the Bopomofo must have been added at the request of partisans of those systems, but that they are unnecessary for 99% of potential users of Google Translate.  In the whole world, there are probably no more than thirty million people who would be likely to use Bopomofo to enter Chinese, and the demanding, difficult Wubi ("Five Stroke") method is proficiently used almost entirely by professional typists in the PRC, so there are maybe another thirty million potential users of Google Translate's Wubi function.  (Of course, that is just a very rough guesstimate, but of the hundreds of Chinese I know who regularly input Chinese, only one can use Wubi fairly well and another one can use it rather poorly; all the rest use Pinyin, except for a tiny percentage who write the characters with a stylus or their fingertip; a relatively small number of people in Hong Kong [around seven million population] use Cangjie or other shape-based inputting system].)

In contrast to the Wubi and Bopomofo entry methods, which are useless to me and to most people, I am very happy now to have the capability to enter traditional characters directly with Pinyin.  Before, I could produce traditional characters with Google Translate, but it required a somewhat time-consuming, round-about, cut-and-paste procedure.

[N.B.:  I actually learned much of my Chinese by relying on Bopomofo, but that was long ago in Taiwan.  Nowadays there are very few people outside the island of Taiwan who are familiar with it.  I now am an ardent advocate of Pinyin-annotated character texts, which would fulfill the same function as the widely available Bopomofo-annotated character texts that I relied on so heavily during my first five years of learning Chinese.]

All in all, the changes in the bottom left corner of the Google Translate Chinese inputting box are welcome.  There's probably no harm done by adding the Wubi and Bopomofo (so long as including them hasn't increased the complexity of the overall system to the point that it is less efficient than before) and doing so has likely made several million people happy.  Having direct Pinyin access to the traditional characters is definitely a boon for those who favor the full forms of the characters.



28 Comments

  1. Carl said,

    January 27, 2013 @ 8:25 pm

    What would you say are the advantages of pinyin over bopomofo? Just the fact that knowing the Roman alphabet is useful in the globalized world, or anything linguistically relevant?

  2. Alyssa said,

    January 27, 2013 @ 9:19 pm

    It's strange to hear someone talk about thirty million people as if that were a small number. Dutch, Greek, and Swedish are all supported by Google Translate, and they all have well under 30 million native speakers.

  3. Marcos said,

    January 28, 2013 @ 1:12 am

    I agree with Alyssa. I understand that Chinese languages are spoken by over a billion people, but to make thirty million people happy seems like a pretty big deal to me, especially considering the fact that Google Translate supports Icelandic, Maltese, Irish, Welsh and Basque, all of which have only a fraction of 30 million speakers.

  4. Marcos said,

    January 28, 2013 @ 1:15 am

    Also, a question: How do non-Mandarin speakers input characters? I realize that in the context of computing, this group probably consists of about 90% Cantonese speakers or more, but perhaps other languages like Minnan in Penang as well. What are the most widely-used systems?

  5. Observation said,

    January 28, 2013 @ 3:31 am

    Pinyin for traditional characters is a great new feature for Google Translate! I think it's still no match for the Zhuyin input provided by Windows, which can be set to Hanyu Pinyin in the settings. Microsoft allows the user to enter tones, has a built-in set of words, and can learn words that you type often. For example, when I type the character yue4, it comes out as 月, but once I type the next character (yu3), the 月 is automatically changed to 粵, giving 粵語.

    In Hong Kong, I don't think there really is any dominant input method. Many older people refuse to learn a romanisation systems or input methods, and use writing pads as an input device for Chinese characters. Many younger people who have learnt Putonghua at school (myself included) use pinyin, while others use input methods based on character shape, such as cangjie, sucheng 速成 (a simplified form of cangjie) and 九宮. There are also some who type with Cantonese romanisation methods – Microsoft offers an jyutping input method. This is not to say that most people are good at one input method – Hong Kong singer Stephy Tang wrote a book using the pinyin input method, which resulted in hundreds of wrong characters. Some of the mistakes seem to be caused by her poor Putonghua, e.g. 朦朧 -> 濛濃 or 一併->一拼, while others seem to be caused by homophones, e.g. 克服 -> 刻服 .

  6. Vicki said,

    January 28, 2013 @ 7:58 am

    My reaction was the same as Alyssa's, that 30 million people is a lot. (That's close to the population of the country I'm typing this from, and a lot more than the number of Francophones here.)

  7. Victor Mair said,

    January 28, 2013 @ 8:25 am

    @Carl

    There are hundreds of millions (maybe even over a billion) people who know and use Pinyin. There are less than 30 million people who use Bopomofo. As to to some of the linguistic advantages of Pinyin, several things that immediately spring to mind are word division, capitalization, italicization, punctuation — all of which are highly developed and standardized for Pinyin, but not regularly used or absent entirely for Bopomofo.

    @Alyssa, Marcos, and Vicki

    As you can see from my post, Chinese is very well supported by Google Translate, with FOUR different systems for inputting offered. I doubt that there is any other language for which Google Translate or other major software offers multiple inputting systems. Wouldn't you say that is a luxury (special treatment) enjoyed only by Chinese?

    During the last three decades and more, there have been thousands of Chinese inputting systems devised and employed by smaller and larger numbers of people. They tend to come and go, mostly go, but there are still hundreds of Chinese inputting systems around, several of which have hundreds of thousands and in some cases millions of users. Should Google Translate accommodate all of these more or less ephemeral inputting systems for Chinese?

    Bear in mind that Chinese IS generously represented by Google Translate, in fact more so than any other writing system I know of.

    @Marcos

    All around the world, most people use Pinyin to input Chinese characters, even if Mandarin is not their first Chinese language. As mentioned in this post and many other Language Log posts, together with the comments to them, there are also shape-based inputting systems employed by a small number of individuals, mostly older folk who did not learn Pinyin. A tiny group of people use Jyutping (one of the available Cantonese Romanizations) to input characters. For many Chinese languages, such as Shanghainese and Taiwanese, there are no standardized Romanizations, so it is hardly to be expected that there would be phonetic inputting of characters available for them.

    @Observation

    "…the Zhuyin input provided by Windows, which can be set to Hanyu Pinyin in the settings."

    I've tried the Zhuyin input system of Windows, and I myself do not find it to be as versatile and fast as Google Translate's Chinese inputting system.

  8. mollymooly said,

    January 28, 2013 @ 9:38 am

    Victor Mair

    For many Chinese languages, such as Shanghainese and Taiwanese, there are no standardized Romanizations, so it is hardly to be expected that there would be phonetic inputting of characters available for them.

    Do the Wubi and Bopomofo entry methods work for Shanghainese and Taiwanese? If so, then the OP's rather grudging acceptance of these methods would seem regrettably Putonghua-centric. Google might as well remove the Swedish and Dutch translators, since almost all Swedes and Dutch people speak excellent English.

  9. Alyssa said,

    January 28, 2013 @ 12:15 pm

    @Victor Mair

    I guess I just don't find it surprising (much less "special treatment") that Chinese would be better supported than other languages, considering how many speakers it has.

    Of course they're not going to offer support for every single inputting method out there, just like they don't support every language out there. Instead they weigh the effort it'll take to implement vs the number of users affected, and decide if the benefit is worth the cost. A feature which benefits 30 million users is almost certainly worth implementing – that's a lot of people.

  10. Theodore said,

    January 28, 2013 @ 1:39 pm

    Victor Mair said,

    I doubt that there is any other language for which Google Translate or other major software offers multiple inputting systems. Wouldn't you say that is a luxury (special treatment) enjoyed only by Chinese?

    I just checked Google Translate for Russian and it allows two methods: Cyrillic and Romanized. While there I noticed other Cyrillic-written languages (e.g. Bulgarian) also support this, as well as Urdu, Hindi, Arabic…

    So it's not necessarily the population of speakers or favoritism but it does seem to require at least existing translation support combined with a non-roman orthography.

    They apparently haven't gotten around to Romanized Armenian input. Maybe there are not enough Armenian-speaking developers at Google?

  11. Coby Lubliner said,

    January 28, 2013 @ 2:56 pm

    What about someone who wants to translate written Chinese text and doesn't necessarily know the pronunciation of the characters? With my printed dictionary I use the radical/stroke number method. Is there anything like that on Google?

  12. Victor Mair said,

    January 28, 2013 @ 3:28 pm

    @Coby Lubliner:

    Do you know the translation of any characters, or do you just translate character by character without knowing the language?

  13. Victor Mair said,

    January 28, 2013 @ 3:30 pm

    I think that several of the commenters to this post and to many other Language Log posts are confusing language and script.

  14. michael farris said,

    January 28, 2013 @ 4:55 pm

    "I think that several of the commenters to this post and to many other Language Log posts are confusing language and script"

    But isn't one of your points that the entire Sinophone world does so on a regular basis (and that their entire written tradition is based on confusing the two?)

  15. leoboiko said,

    January 28, 2013 @ 4:59 pm

    Yay pinyin for traditional!

    Nothing against guesstimates, but it would be nice if we could somehow have real data on usage of Chinese input methods.

    I agree with the other commenters that it’s very commendable to support the typing habits of millions of users, even if the millions are a tiny percentage of Chinese people (broadly defined). A tiny percentage of Chinese people is bigger than most countries…

    @Coby: I don't think so. You'd have to resort to other online sites, which support the traditional lookup method ("radical"/stroke count—I dislike the translation "radical") as well as more sophisticated lookups (there’s a lot of such sites; say 1, 2, 3, 4). If you already have the text in electronic format you can just cut and paste into Google Translate; if the text is in a web site, I recommend the excellent Zhongwen browser plugin, which gives you p­īnyīn and character info by hovering the mouse cursor (Chrome version, Firefox version) (but it’s still useful to paste into Translate to see language-level analysis, rather than character-level).

    It would be cool to have a little drawing pad in Google Translate, as a convenience to input characters you don’t know—though I find drawing the characters with the mouse to be super awkward (the touchscreens of smartphones work much better, and pen-based touch screens even more). It would be even cooler if it could leverage Translate’s linguistic context to help in guessing the character (e.g. if the user entered 15 unrecognizable squiggles after inputting 蝙、 chances are they mean 蝠…)

  16. Victor Mair said,

    January 28, 2013 @ 8:50 pm

    @michael farris

    Yep, sadly.

  17. Matt said,

    January 28, 2013 @ 9:12 pm

    I'm sure I'm missing something important, but wasn't it already possible to enter traditional characters using pinyin, via the OS? Apple products have a "Pinyin – Traditional" mode, and according to a web search it's not hard to set up an equivalent IME in Windows.

  18. Apollo Wu said,

    January 28, 2013 @ 9:40 pm

    Sogou Pinyin, a free PRC Pinyin input, appears to be the most popular for inputting Chinese in PRC. It can toggle to output traditional Chinese with the CTRL+SHIFT+F. Pinyin initial is very efficient indeed. For example, sqsj, 山穷水尽:mmhc 明眸皓齿: dht 大会堂。 It contains many specialized dictionary ranging from medical to Chinese martial art novels.

  19. Victor Mair said,

    January 28, 2013 @ 10:07 pm

    @Matt

    Of course, there were already tons of ways for entering traditional characters. Here we are celebrating the new, direct way to input them with Pinyin via the super-easy Google Translate inputting system.

  20. Matt said,

    January 29, 2013 @ 12:29 am

    I'm not trying to rain on the celebration, I promise! I'm just trying to figure out why Google Translate's input functionality is worth celebrating, since it doesn't seem any better (or worse) than any other Pinyin-based IME. Are we just celebrating the fact that a high-profile site like Google Translate is showcasing the fact that Pinyin can be used to enter simplified or traditional characters, i.e. that Pinyin isn't linked to a particular character set/political entity but rather to the (spoken) language itself?

  21. Marcos said,

    January 29, 2013 @ 1:03 am

    Thank you for the enlightening reply, Observation. I especially appreciate your genuine desire to share knowledge and answer my question in a straightforward and helpful way, avoiding any unnecessary condescention. :-)

  22. leoboiko said,

    January 29, 2013 @ 7:13 am

    @Matt: Personally I'm celebrating the fact that I can stop fighting with scim and uim to get them to input pīnyīn → Chinese Traditional as well as Japanese and IPA in my Linux Eeepc (if I don’t use the exact precise values for locales and XMODIFIERS and GTK_INPUT_METHOD etc it just doesn’t work, and what works for some apps and languages doesn’t work for others, and when I finally get a working configuration it breaks in the next update…). Whereas with the web IM I don' have to configure anything, it just works. I don’t even use it to translate much, more as a kind of learner’s scratchpad-dictionary. Now if only there was a keyboard shortcut to play the text-to-speech audio…

    Also, online services are super convenient as a quick solution when you're at your mom’s computer, an Internet cafe, a lousy conference room workstation, etc.

  23. Victor Mair said,

    January 29, 2013 @ 7:18 am

    @mollymooly

    Rather than yet another clunky, cumbersome, slow shape-based inputting system for single CHARACTERS (we already have hundreds of these), what it would be wonderful to have for languages like Shanghainese, Taiwanese, and so forth are regularized Romanized input systems tied directly to their WORDS.

  24. B.Ma said,

    January 29, 2013 @ 8:27 pm

    I still don't see any cause for celebration. I've been inputting Trad and Simp in words (as opposed to characters) for over 10 years with the Windows XP and newer IMEs.

    I mean, it's pretty neat that they've managed to write the IME in JavaScript, which means that it's now usable on all computers that have a recent browser, but everyone knows pinyin input is the way forward for regular people. Wubi and Cangjie are faster but only if one is able to type fast enough. Most people still type with 2 fingers.

    On phones people seem to prefer written input, but I have yet to see a program that can interpret scribbles in the vein of caoshu, which would probably be the fastest input barring speech.

    For Cantonese the most intuitive IME is probably cpime.hk, as it allows you to choose between several romanizations (Jyutping is crap for an English speaker who likes to spell things with English orthography, but perhaps less so for Germans) and if you can't find the Cantonese word, you can type an English translation and it will suggest characters.

  25. Ellen K. said,

    January 30, 2013 @ 8:51 am

    Noting something as a welcome change is not the same as celebrating.

  26. Andy Averill said,

    January 30, 2013 @ 9:29 am

    I'm assuming you can still use cut and paste if you have a character whose pronunciation is unknown to you.

  27. Eric said,

    February 1, 2013 @ 12:51 am

    Anecdotally, a Filipina friend knows zhuyin from Chinese school. I get the impression many Tsinoy learn it with Zhongwen due to the widespread use of Taiwan-originated materials in the Philippines.

  28. CW said,

    February 18, 2013 @ 11:01 am

    I'm a little late to the game, but I'll add something anyway. I don't know how many people use wubi, but my wife and her friends mostly use wubi. And they aren't professional typists. They went to class to learn how to use it.

    Why do they use it instead of pinyin? Because normally when they IM (ie QQ) each other, they IM each other in their vernacular language – a Yue language or something similar. When they type to each other, they normally think of what to type in their Yue language and not in putonghua. Therefore, it's more natural to use wubi than putonghua because the characters in putonghua might have a difference meaning than in the Yue language. In fact, my wife sometimes can't even remember how a Chinese character should be pronounced in putonghua let alone how to write it in pinyin. Also, my wife and some of her friends don't find it natural to write Chinese in pinyin.

    As for myself, I don't know how to type in wubi, but I do know how to do in Cangjie, and it's really not that hard to understand the concept once you get used to it – although the typing might still be pretty slow. Cangjie input is nothing more than a recursive algorithm – to use a programming/math terminology. In fact, Cangjie very naturally generalizes to inputing the specific Cantonese Chinese characters eg the Cantonse specific characters with a mouth radical on the left and normal Chinese characters on the right.

RSS feed for comments on this post