I am a cat?
« previous post | next post »
[This is a guest post by Nathan Hopson]
Every once in a very long time, machine translation does something sublime. Usually ridiculous, but just occasionally sublime.
Here's what happened to me the other day.
First, let me begin with a mea culpa: I posted a cat video to the internet. Yes, I finally gave in and committed that gravest of sins. Here's the video:
I couldn't resist when my mother's cat, Mochi (it's a tradition in the family to give the cats Japanese names) dipped his paw in the dregs of my coffee and begin licking his paw. And then did it again. And again. You get the picture. He'd done this before, and I'd even managed a photo or two, but never video.
So I gave in to the temptation, and uploaded a video to Instragram, which you see above.
Having never posted a kitty vid before, I was nonplussed (nonpussed!) if not altogether surprised by the fact that this quickly became one of my most popular posts ever — though granted that's really not saying much.
Things took an interesting linguistic turn when my mother told me that the machine translation of my Japanese caption was, "I am at the cat cafe."
For Instagram users, my username is gojoppari.
Users of the Instagram app should be offered the option to translate my post by clicking "See Translation" at the bottom of the screen, but that will not work with the online version.
What I had actually written was:
吾輩は珈琲猫である
Wagahai wa kōhī neko de aru
"I am a coffee cat"
a play on Natsume Sōseki's famous novel,
吾輩は猫である
Wagahai wa neko de aru
"I am a cat"
Sōseki's satirical novel appeared first in installments serialized in the literary rag Hototogisu in 1905-1906. It is a poignant critique of Japan's Meiji-period modernization and Westernization. The narrator is a house cat who refers to himself rather grandiloquently as 吾輩 (wagahai), a personal pronoun more suited to the nobility than to the humble Felis domesticus. English being English, the nuance of the personal pronoun is entirely lost in translation.
How exactly the machine translation derived "cat cafe" from 珈琲猫 kōhī neko is somewhat beyond me. Perhaps the corpus or a well-meaning but misguided user-contributed suggestion sometime in the past led to this particular felicity. After all, especially in Japan, where kitty cafes are big business, it is certainly more likely that "I am at the cat cafe" than that "I am a cat." Either way, there is a certain poetry of context to this marvelous — and probable — metamorphosis.
On the other hand, the copula である (de aru) should in no circumstances have been misconstrued as any kind of prepositional. It's simply an equative: I = coffee cat. So my blush of delight was clouded by confusion and disappointment.
To satisfy my curiosity, I ran my original caption through a number of machine translators on the internet.
The results were puzzling.
Bing, Yahoo, and Babelfish were disappointingly accurate: "I am a coffee cat."
Yandex was simply baffling: "Shingo turned to the coffee cat." Who is Shingo? Why is he turning? Google, often the best of the bunch, was also simply disappointing. Big G perplexingly gives: "Spence is a coffee cat." No, he's not.
Not one approaches the transformative magic of "I am at the cat cafe," and I am left wondering why. And feeling like I'd like to go to a kitty cafe….
Victor Mair said,
August 19, 2016 @ 7:58 am
"English being English"
We are a cat.
leoboiko said,
August 19, 2016 @ 8:08 am
One of those things does not fit the other.
Cecilia Segawa Seigle said,
August 19, 2016 @ 8:46 am
If you have written:
吾輩は珈琲猫である
Wagahai wa kōhī neko de aru,
naturally the translation came out as
"I am a coffee cat"
"I am at the cat cafe" would be
吾輩は猫珈琲屋にいる。
Don't you think?
Chris Button said,
August 19, 2016 @ 8:50 am
It looks like Google's "Spence" comes from what seems to be an alternative English title for Soseki's book:
https://www.amazon.com/Cat-Spence-English-version-Am/dp/4805310979
Thorin said,
August 19, 2016 @ 9:17 am
My Instagram translated it into Brazilian Portuguese as "Estou no cat cafe", meaning, "I am at the cat cafe."
Michael Rank said,
August 19, 2016 @ 10:28 am
Is “coffee” often written as the rather precious looking 珈琲 rather than コーヒー ?
Jay Rubin said,
August 19, 2016 @ 10:35 am
Since "cafe" means "coffee," some level of confusion is understandable, though I would have expected "I am a cafe cat" before "I am at the cat cafe."
Laura Morland said,
August 19, 2016 @ 10:48 am
So, thanks to @Chris Button's contribution, we now know that your evaluation of Google's translation deserves a re-write!
Although the intrusion of "Spence" may be annoying, "Big G" should get bonus points for picking up the Soseki reference.
Chris Button said,
August 19, 2016 @ 11:07 am
For some reason, this got me thinking and I have a solution.
吾輩 = “I” (but since the book has an English translation suggesting the cat’s name to be “Spence", Google translate gave this translation instead – possibly generated by a user contribution)
珈琲 = “coffee". However the word “café” can be used to refer to the drink as well. As a result, a poor MT system gives “café” as the translation, but then treats it as the location rather than the drink.
である = grammatical copula. My suspicion is that had the original Japanese used the more colloquial “desu” then the translation might have been more accurate. However since we have “de aru", the inadequate MT system has conflated it with “ni iru". Of course “de aru” and “ni iru” are very different in Japanese, but in English both “de” and “ni” can be translated as “at” while “aru” and “iru” can be translated as “am". For example:
– In the sentence “Wagahai wa kōhī neko de aru” ("I am a coffee cat"), if “de” is accorded its technical status as the grammatical particle “at” used for a location where an active verb is taking place (which would not be possible with “desu” as a conflation of the longer “de arimasu” version of “de aru") then “aru” is left to mean “am” in English.
– In Cecilia’s comment above “"Wagahai wa kōhīya ni iru” (I am at the coffee shop), “ni” is the grammatical particle “at” leaving “iru” to mean “am" in English”
So the thoroughly inadequate MT system thinks “de aru” and “ni iru” both mean “am at” which technically, albeit it under very different grammatical conditions in English, they do. It then translates "de aru" as it would "ni iru".
Chris C. said,
August 19, 2016 @ 4:07 pm
"Coffee Cat" is the name of the coffeehouse where I stop every morning on the way to work. Maybe Google's translator found that usage and somehow applied it to the result?
David Morris said,
August 19, 2016 @ 4:18 pm
Google Translate's Japanese to Korean also invokes Spence: 스펜스 커피 고양이 이다, seupenseu keopi goyang-i ida, lit. Spence coffee cat be.
E. T. said,
August 19, 2016 @ 5:20 pm
I would guess that 1) "cat café" is a far more likely combination of words than "coffee cat", based on the corpus the translation engine is trained on, and 2) that the mathematical formula that the translation engine has used to produce this sentence weighs the likelihood of an n-gram appearing in the target language rather heavily (compared to the more successful engines we have seen).
Therefore, as you alluded to, it may be thinking that "I am a coffee cat" is too unlikely a thing for anyone to want to say, and thus corrects this to what it assumes you meant to say.
I'm sure someone with more extensive knowledge of NLP will pop in to correct or expound upon this.
Chris said,
August 20, 2016 @ 3:42 am
I would be inclined to translate the title of that novel into English as "Le chat, c'est moi"… but that would probably be quite inappropriate to its text (which I confess I have not read).
Chris Button said,
August 20, 2016 @ 7:55 am
@ E.T.
You may well be correct, but I'm still curious to see whether changing "de aru" to "desu" would generate the same translation.
Brendan said,
August 20, 2016 @ 2:40 pm
I think Chris Button's reading is correct — the sentence looks to me like the best guess of a statistical machine translation system that is familiar with "de" and "aru" but not with the more formal "de aru." This makes sense given that Facebook's in-house machine-translation system is apparently trained on social media postings and other less-formal text, rather than on more formal input as with other MT systems.
Facebook used to use Bing for its translation services, so I was surprised and pretty impressed to learn that the new results come from its own system. The MT on Facebook and Instagram is very far from perfect, but for Chinese->English it often stacks up pretty well against Google's offerings. I made my own (very) little literary joke on Instagram the other week when I posted a photograph of three 北冰洋 ("Arctic Circle") soda bottles from Beijing, with the caption
「多年以後,面對著槍決執行隊的何毖上校將會想起父親帶他去見識北冰洋的那個遙遠的下午。」
Instagram mindlessly auto-translates this as:
which is hardly any worse than Google's version:
The joke, such as it is, sort of hinges on the Chinese name for the soda, but if we mistranslate it as "Icee" — the name of a different and considerably more disgusting summer drink — then it can kind of be made to work in English:
The machine-translated versions are not exactly Gregory Rabassa, and Google bizarrely reads 毖 as "Mi," but both Facebook and Google actually produce something close enough to give non-Sinologues a shot at catching the reference. That's pretty wild.
/df said,
August 21, 2016 @ 7:11 am
Leaving aside the connotations of "wagahai", shouldn't a Turing test-winning MT (the sort that might have offered @Chris's "Le chat, c'est moi") have recognised the word play in Nathan's Japanese caption and generated "coffee cat" as a play on "copycat"? So maybe "We are a coffee cat".
leoboiko said,
August 21, 2016 @ 8:44 pm
@Michael Rank: My subjective impression is that 珈琲 isn't rare, though コーヒー is still the normal orthography. Among old-fashined ateji (kanji phonograms), 珈琲 is probably one of the most popular/enduring. Google Books currently estimate 52k 珈琲 vs. 145k コーヒー.
Matt said,
August 22, 2016 @ 4:23 am
"Shingo turned to the coffee cat." Who is Shingo? Why is he turning?
I'm not sure, but there's a reasonable change he spells the "-go" in his name with 吾 (the "wa(ga)" of "wagahai." Maybe the rest of his name is just out of frame to the left, because he's still in the middle of turning to see the coffee cat more clearly.
It looks like Google's "Spence" comes from what seems to be an alternative English title for Soseki's book:
This is only an alternative English title in the sense that somebody's bot passed 吾輩は猫である through an online translator and threw the result up on Amazon in the hopes that someone would accidentally put it in their cart, though. It seems that 我輩 has somehow become associated with "Spence" somewhere in Google's furthest reaches, for reasons that escape me entirely.
Syphus said,
August 22, 2016 @ 11:30 am
I see someone played around with Google Translate. So I can't check. However, 吾輩はコーヒー猫である works just fine. So I am pretty certain this is simply about using 珈琲. While Google Books may give a number of uses of it, in General Internet stuff, I can't think I've seen anyone use it outside of Cafes and places that sell Coffee stuff.
leoboiko said,
August 22, 2016 @ 3:53 pm
@Syphus: General Internet? Does a Twitter search count?:
> 珈琲の香りの中でぼんやり考える (@rsm60038)
> 珈琲をゴクゴク呑むよう (@lotusteajikkyou)
> お洗濯完了✨ マッタリ~珈琲time~☕☀ (@sayasaya3420)
> 勉強する前に珈琲を飲み30分仮眠 (@superurawaza)
> カレーもデザートも珈琲も美味しかった?明日もよろしくね (@luschka_nico)
These are all frontpage hits. Also, a lot of people seem to write foam latte art as 珈琲画.
Chris Button said,
August 23, 2016 @ 7:44 am
@ Matt
Good point re. "Spence". Looking at the book image, it doesn't actually say "Spence" anywhere on it!
This whole "Spence" and "Shingo" thing is very mysterious. You might be on to something with 吾 representing the "go" of "Shingo" though. But as for "Shin"….