Scowling = career?
« previous post | next post »
Inspired by "Garfield Lost in Translation", where the text of cartoons is "automatically translated from English to Chinese and back using Yahoo or Google", I decided to try the round-trip translation technique on this morning's Stone Soup:
Putting the first panel's balloon through Google's English to (Simplified) Chinese and back, I get:
Holly, I'm tired of your scowling. Either put on a happy face or walk the rest of the way.
冬青,我已经厌倦了你的任职期间。无论是提上一个愉快的面对或步行其余的方式。
Holly, I'm tired of your career. Both put on a happy face or walk the rest of the way.
In some ways, this is terrific. There are only two words wrong out of 19.
In another way, it's disappointing. Google's statistical translation engine has somehow formed the opinion that scowling ought to be translated as 任职 rèn zhí "hold an office or post" — that's its story, and it's sticking to it, even when scowling is presented in isolation. In contrast, CEDict translates scowling as 怒容满面 nù róng mǎn miàn "rage written across one's face" (which Google translates back to English as "Nurongmanmian", but never mind that…), and offers a variety of options for scowl.
There's undoubtedly a logical explanation. But statistical MT seems to have a very long tail of stubborn oddities like this, in cases where an ordinary bilingual dictionary would do something more sensible. I'm not suggesting a return to the bad old days of hand-crafted transfer rules, but there must be a way to avoid the endless recapitulation of problems like this one.
[Let me make it clear that all I know about Chinese is what I see in dictionaries, so I may be missing something here. But I believe that the general point about statistical MT is a valid one.]
matt m said,
August 4, 2008 @ 9:12 am
I think that's an interesting point about Bayesian MT- it's hard to accept in some ways, because we can always see where we could add a rule to make it better, but we don't necessarily see what is causing the errors in the first place. Peter Norvig's talk at Google Developer Day's [ http://www.youtube.com/watch?v=nU8DcBF-qo4 ] really made me see that the answer to this is to provide more data to the learning algorithm. I wonder if better data wouldn't help even more. Much like Page Rank was a great innovation on the search problem, perhaps weighting things like dictionaries higher than other pages makes sense. Just a thought…
T.I. said,
August 4, 2008 @ 10:01 am
And in some cases, you understand perfectly where the error occurred with the online translation program: http://adweek.blogs.com/adfreak/2008/07/then-well-grab.html.
Bruce Rusk said,
August 4, 2008 @ 10:17 am
Small point: the translation for “scowling” Google offers is actually rènzhí qījiān任职期间, “period of officeholding,” hence career, rather than just rènzhí. And Google Translate gets it right in the other direction: “I scowl” becomes 我任职 wǒ rènzhí, which it rightly renders back into English as “I work.”
Pekka said,
August 4, 2008 @ 10:38 am
It looks like Google Translate does not translate scowling at all when going from English to some other languages. For example, to Spanish: Holly, estoy cansado de su scowling.
Pekka said,
August 4, 2008 @ 10:44 am
If I may add: GT does seem to get scowling right when the text goes on a round-trip through Arabic (but I can't read the Arabic version, so I can't tell). Perhaps this indicates something about the corpora in various languages that were used to train the translator? Not enough scowling going on in all of them?
matt m said,
August 4, 2008 @ 10:44 am
Another interesting point is that "tired of your scowling" is a novel phrase in the Google index. There are 990 results for "your scowling", but many of these use scowling as an adjective as in "your scowling face".
Bill Poser said,
August 4, 2008 @ 1:26 pm
A point that doesn't come out in the English back-translation is that Google translated the personal name "Holly" as 冬 青 dong¹ qing¹ which is the plant "holly", literally "winter green". It did not render it phonologically, which is the usual practice, indicating most likely that it did not recognize it as a name.
David said,
August 4, 2008 @ 7:18 pm
The strangest aspect for me is that after the process, the English still makes sense, but the Chinese mid-step would be completely unintelligible to non-English speakers (or even English speakers, without knowledge of the source sentence).
Justin L said,
August 4, 2008 @ 8:16 pm
"Scowling" doesn't translate in Google's French translator, either. Interestingly, in the Spanish version, Google translates "your" as "su" (quite appropriate for a formal "usted" situation) but then translates "su" back as "his".
This raises the very interesting translation of how/whether computer translators can be taught to recognize homonyms that are not obvious as to which meaning is intended, as well as functional words like prepositions that do not correspond well between languages.
Kanou said,
August 5, 2008 @ 9:35 am
Just another example, but Google's Japanese version doesn't translate "scowling" at all; choosing to leave it in English like so:「ホリー、お客様のscowlingはもううんざりなんです。幸せな顔をするかのいずれかの残りの道歩いています。」
However it correctly translates scowl alone as 「顔をしかめる」.
It also for some reason changes "your" to "our honored customer's", but leaves it out entirely when making the round trip back to English. "Holly, we are tired of scowling. A happy face on one or walk the rest of the way." 不思議。。。
Ellen K. said,
August 5, 2008 @ 10:17 am
Babelfish (now part of Yahoo) does pretty badly. It doesn't even get "I'm" tranlated at all. The "I'm" just sits there in the middle of the Spanish or Chinese (traditional). Which looks particularly odd with the Chinese.
Rick S said,
August 5, 2008 @ 1:36 pm
I frequently amuse myself by Google-translating stories from Puerto Rico's daily, El Nuevo Día, into English and then submitting a revised translation back to Google in hopes that I'm helping train the system. So I see this sort of oddity all the time. Most of the aberrations aren't as bizarre as scowling = career, and you can often get a sense of where they came from. For example, one of today's ledes is "Olvidados en las cavernas". GT translates it as "Forgotten in the caves", which could make perfect sense if the story were about cave-dwelling fauna perhaps. But the story is about tourists getting lost, another meaning of olvidados. Such mistakes aren't surprising, and will probably remain until statistical MT is enhanced with semantic understanding in some form.
Still, GT just gets some things flat out wrong, with no easily conceivable explanation. The sub-lede on the same story is "El cierre de la atracción turística impacta la economía de la zona" ("The closing of the tourist attraction impacts the local economy"). Somehow, GT inverted the NP complements, yielding "The closure of the economy impacts tourist attraction in the area"!
On the subject of MT vs. bilingual dictionary, I'm confident it will get better as the algorithms are improved. For instance, scowling = career could probably be avoided by using a dictionary lookup during training to generate another weighting factor in the statistical analysis, driving a hint mechanism to make the translator favor idiomatic translations more strongly. But for this to work well, the training corpora should be vetted for off-the-wall and contextually sensitive translations, and right now that's probably too labor intensive.
Babel Fish, Snickers, Godzilla, and Garfield « 360 said,
August 19, 2008 @ 6:39 pm
[…] more of this on The Language Log and also on The Lansey Brother's Blog (using the Gettysburg […]
Alan Shaw said,
September 26, 2008 @ 7:48 pm
Two other problems that are masked by the back-translation: 面对 is the verb "to face," and 方式 is "way" as in "manner, fashion."