Love <–> hate

« previous post | next post »

Baidu ("the Chinese Google") is a popular search engine in China.  The web services company (registered in the Cayman Islands) and its name are discussed in "Soon to be lost in translation," which I posted a little over a year ago.

Now Baidu has launched a new machine translation service.  A friend of mine in China impishly suggested that I give Baidu Fanyi a whirl by typing in 我恨中国.  Language Log readers are invited to try it themselves and see what they get.

Between 5:30 a.m. and 7:30 a.m. this morning, I entered 我恨中国 into the online Baidu translation system half a dozen times, and each time I got back "I love China."  As anyone who has studied first year Chinese knows, Wǒ hèn Zhōngguó 我恨中国 means "I hate China."

If you type in Wǒ ài Zhōngguó 我爱中国, you also get "I love China," which is what you'd expect.

If you enter Wǒ hèn Měiguó 我恨美国, you get "I hate the United States of America," which is as it should be.

Likewise, if you enter Wǒ ài Měiguó 我爱美国, Baidu Fanyi returns "I love the United States of America," an acceptable translation.

One wonders who is responsible for this glitch (?) in the system, and how long it will take before it is corrected.

Overall, Baidu Fanyi is an exceedingly pale imitation of Google Translate, lacking many of the useful features of the latter.  One of Google Translate's most outstanding functions, so far as I'm concerned, is its automatic transcription of Chinese characters into Pinyin with tones.  The audio production capability for whatever is displayed is also impressive (sounds quite natural).

Moreover, Google Translate can handle dozens of languages.  Just for fun, I started to play around with it and entered "What is your name?"  Effortlessly and instantaneously I could find out how to say and write that question in German, Spanish, Japanese, and many other languages.  Here's the Hindi:  तुम्हारा नाम क्या है? Google Translate transcribes that as Tumhārā nāma kyā hai?  The transcription follows the script closely, hence नाम is rendered as nāma, whereas the audio recording pronounces that as nām, which is the way I hear it in real life.  No matter, Google Translate is already a phenomenal tool, and is getting better all the time.

So far, it seems that Baidu.Fanyi can only handle English and Chinese, and it's got a lot of catching up to do even there.



21 Comments

  1. David said,

    July 22, 2011 @ 7:08 am

    When I tried the phrase a few minutes ago, I got "I hate China". It seems as if the error has been corrected.

  2. Victor Mair said,

    July 22, 2011 @ 7:12 am

    Hah! I just tried at 7:57, and it has indeed now been corrected.

  3. Stuart said,

    July 22, 2011 @ 7:25 am

    Google Translate IS a great tool, but the better my Hindi gets, the more I appreciate the limits of its translation abilities. I love the pronunciation feature, but still get amused by glitches like this one:
    http://maxqnzs.com/Englisch.jpg

  4. crturang said,

    July 22, 2011 @ 9:04 am

    The Hindi display would not be an error in Hindi (but would be one in Sanskrit). It is understood that in words written as नाम (and in words like राम, घर) the terminal अ sound is not pronounced.

  5. D said,

    July 22, 2011 @ 9:16 am

    At first it sounded like the most petty application of Chinese censorship yet, but if it has been fixed, maybe that's not the case after all.

    Google Translate's usability varies a lot depending on what language pairs you are using (I assume both languages would matter since I think it uses bilingual corpora for translation. Is this correct?). I use it a lot for Latvian, which it presumably doesn't have the amount of data for that it has for, say German. While it definitely has its uses, it's wrong so often that I've stopped trusting it completely even for single word translation.

    And sometimes on the other hand, it impresses me by translating an entire semi-complicated sentence perfectly.

  6. Ellen K. said,

    July 22, 2011 @ 9:59 am

    @crturang, I'm pretty sure it's the English translation that Stuart is referring to as an amusing glitch.

  7. Dan T. said,

    July 22, 2011 @ 10:05 am

    It's kind of a shame it's probably just a technical glitch… the idea of the Chinese government trying to impose its own form of "Newspeak" on its country is amusing.

  8. Victor Mair said,

    July 22, 2011 @ 10:22 am

    I tried out a few sensitive terms, first in Google Translate, then in Baidu Fanyi.

    ****Google Translate:

    héxié shèhuì 和谐社会 ("Harmonious Society")

    héxiè 河蟹 ("crab")

    héxiè shèhuì 河蟹社会 ("crab community")

    cǎo ní mǎ 草泥马 ("mud horse")

    cāo nǐ mā 操你妈 ("fuck")

    cào nǐ mā 肏你妈 ("肏 your mother")

    gānguǒ 干果 ("dried fruit")

    ****Baidu Fanyi:

    héxié shèhuì 和谐社会 ("Harmonious Society")

    héxiè 河蟹 ("river crab")

    héxiè shèhuì 河蟹社会 ("crab society")

    cǎo ní mǎ 草泥马 ("grass mud horse")

    cāo nǐ mā 操你妈 ("fuck your mother")

    cāo nǐ mā 肏你妈 ("fuck your mother")

    gānguǒ 干果 ("dried fruit")

  9. Barry Brenesal said,

    July 22, 2011 @ 10:23 am

    I just tried:

    Hogy milyen volt nekünk 4 nap a Téka együttes ölében?

    and got back:

    What we had four days together in the Library's lap?

    …which I suppose makes a certain kind of sense.

  10. quodlibet said,

    July 22, 2011 @ 1:13 pm

    @stuart – Herzlichen Glückwunsch zum Geburtstag!

  11. LDavidH said,

    July 22, 2011 @ 3:20 pm

    I find that Google Translate is not very good with Albanian grammar. Almost every time I try it out (admittedly in order to "catch it out", as I speak Albanian fluently), it gets things wrong. Sometimes it "only" uses the wrong case ending, but often it's more than that. I just tried "Who is that girl?", a fairly simple sentence, and it treated "that" as a conjunction, messing up the whole sentence. It doesn't distinguish between "know" (about something, Fr. savoir) and "know" (somebody, Fr. connaitre), a distinction as important in Albanian as in French. So it might be OK with big languages, but not with a small one like Albanian.

  12. Victor Mair said,

    July 22, 2011 @ 3:44 pm

    Brendan O'Kane writes (his computer is in the shop, so he couldn't post this himself):

    =====

    A friend at Baidu told me that (a) it was an accident of their statistical translation system, and (b) that people there had known about it for a while, and had a good laugh about it. The story broke on Weibo today, where Joel [Martinsen] saw it and sent it to me. For what it's worth, I think it could plausibly be an accident of SMT – certainly statistical translation is capable of this sort of weirdness – but the error presents in strange ways.

    我恨中國 is "I love China," but 他恨中國 is "he hates China." 我恨(province) becomes "I love (province)" for all Chinese provinces, but this doesn't hold up with other personal pronouns. Meanwhile, it becomes inconsistent with country names: 我恨加拿大 is "I love Canada," but 我恨 美國 and 日本 translate correctly. 我恨高麗棒子 is "I hate Korea," and 我恨愛爾蘭 is "I hate love Ireland." (My fellow citizens mostly feel the same.) Finally, 我恨美帝 is "I hate imperialism."

    (At least, this was all true as of 4:30 Beijing time this afternoon [VHM: 4:30 a.m. Philadelphia time; the situation continued till around 7:30 a.m. this morning]. Haven't tested since.)

    =====

    I just checked these items again now (4:20 p.m.), and they all seem to be translating correctly, except Wǒ hèn ài'ěrlán 我恨愛爾蘭 ("I hate love Ireland"), Baidu Fanyi double counts the ài 愛 to signify both "love" and the first syllable of "Ireland." Google Translate gets it right: "I hate Ireland."

    Incidentally, this morning in my original post, I forgot to mention one spectacular new feature of Google Translate (they keep adding new and useful goodies), namely, you can do fairly sophisticated Pinyin (without tones) entry of not just characters, but also words. Google Translate just keeps getting better and better, easier and easier to use.

  13. Victor Mair said,

    July 22, 2011 @ 3:53 pm

    One more Baidu-related news item, this to be filed under "Innovation with Chinese characteristics," the company has just brought out an internet browser to compete with Google's Chrome and Microsoft's Internet Explorer. Just as Baidu Fanyi is a pale imitation of Google Translate, Baidu's browser (still to be named) is obviously a poor copy of Chrome.

    http://blogs.wsj.com/chinarealtime/2011/07/19/baidu%E2%80%99s-new-browser-looks-strikingly-familiar-google-chrome/?mod=WSJBlog

  14. Stuart said,

    July 22, 2011 @ 7:20 pm

    @quodlibet – thanks! it wasn't actually my janamdin/geburtstag on the day, I took that shot, but the fact that one of Google Translate's own sample pages made such an unusual translation from Hindi into "English" made me smile.

  15. Bathrobe said,

    July 22, 2011 @ 10:21 pm

    Google Translate might be getting better and better, but it still has a long way to go.

    For instance, translating English-language news reports starting with "Reuters" into Chinese always gets a superfluous 新华社 at the start. As though all news reports had to be via Xinhua.

    The same passage will come up with two different renditions of the same word (one I encountered recently was Alberta rendered as 阿尔伯塔省 in one place, and 艾伯特省 a few sentences further down).

    Numbers are often (but not always) garbled. "11 million" might turn up as 11万 (but sometimes 1100万), "1.1 billion" as 1.1亿 (other times 11亿).

    The worst thing is when Google finds a word it can't make sense of and just throws the translated equivalent to the end of the sentence.

    I could go on and on, but first let me go and try out Baidu…

  16. Bathrobe said,

    July 22, 2011 @ 10:23 pm

    The other scary thing about Baidu getting an automatic translation system is that the Chinese government's hackers will now have an excuse to start blocking (or harrassing) Google Translate, as they do with other Google services, since there is now a "Chinese alternative".

  17. Dakota said,

    July 23, 2011 @ 11:50 am

    Another weird thing about google translate — and I have been seeing this at work where there is a proxy server — is that the instant translation doesn't work. There is an option to "turn off instant translation" that I was hoping would lead to another option to enable it, but clicking on it seems to do nothing. I just now tried it with an ordinary WIFI connection and it seems to be working all right, so either they fixed it, or it doesn't work with proxies, or one of the numerous local viruses has eaten some vital thing in my work computer.

    But the most annoying thing about Google Translate is that the option for translating entire web pages is blocked in KSA. Go figure.

  18. Evan said,

    July 24, 2011 @ 1:02 am

    although not mentioned by any of the major news outlets or Baidu itself, the Baidu browser is clearly a fork of chromium. Judging by the placement of the controls on the toolbar, it's not even a recent fork. (More recent versions of chromium/chrome have removed the Go button, moved the star to the right, and combined the page and app menus.)

  19. mira said,

    July 24, 2011 @ 3:42 pm

    Weird. There was a similar problem with Google Translate in Czech that became very famous — if you entered "Miluji Česko" (I love the Czech Republic) and translated to English, it would come back with "I love Africa", and if you entered "Miluji česko", it would give you "I love Ireland". Apparently they've fixed it now, though.

  20. Matt said,

    July 25, 2011 @ 7:44 pm

    There's a well-known but possibly apocryphal tale about Arakawa Naruhisa (a screenwriter) who, in an early display of creative character-centric ingenuity, responded to a training exercise requiring him to say "I love you" (the English phrase) in Japanese 100 ways with a list that began "アンタなんか大嫌い!" ("I hate you!"). Two sides of the same coin, etc.

  21. Eric said,

    July 29, 2011 @ 9:28 am

    More probability-based Google Translation fun:

    http://i1126.photobucket.com/albums/l612/vinyl27/en-la_alpha.gif
    http://i1126.photobucket.com/albums/l612/vinyl27/en-la_alpha2.gif

    (It is in alpha…)

RSS feed for comments on this post