The elegance of Google Translate
« previous post | next post »
When I was in graduate school, some of my best friends were mathematicians. I was always intrigued by their approach to problem solving. They told me that merely solving problems was not satisfying to them. Rather, their goal was to solve problems elegantly.
This morning, I was reminded of the modus operandi of mathematicians when I asked Google Translate (GT) to render a short passage of German into English.
Here's the original text:
"Dass der deutsche Fußball im Namen einer Diktatur handelt, macht mir Angst. Der Einfluss der Chinesen auf die Wirtschaft ist stark gewachsen, auch auf unser ganzes Leben. Die meisten Europäer registrieren das gar nicht."
Here's the result from GT:
That the German football acts in the name of a dictatorship scares me. The Chinese influence on the economy has grown a lot, including our lives. Most Europeans do not register that.
GT slightly cleaned up by VHM:
That German football acts in the name of a dictatorship scares me. Chinese influence on the economy has grown greatly, also on our entire lives. Most Europeans do not realize that.
I was struck by the economy of expression of the GT translation. Not only is it basically accurate, it is fundamentally felicitous and idiomatic. Indeed, I almost feel as though the GT version of the second sentence is in some respects superior to my own "improved" rendering.
The best authors are often those who write with tight terseness. In a way, you could they are bordering on writing prose in a poetic mode. Overall, as has often been pointed out on Language Log, English has a genius for concision.
"Economy of expression" (6/14/16) — includes a couple of other citations
It was GT's ability to capture this intrinsic spirit of conciseness that gave me such a thrill when I saw what it did in English with the German original.
Here at Language Log, we carry out a lot of playful experiments with GT and other types of AI, for which see the posts to the "Elephant semifics" topic. I also want to pay tribute to GT for its elegance and practicality, its sheer usefulness.
See:
"The wonders of Google Translate" (9/22/17)
"Don't blame Google Translate" (2/4/18)
"Google Translate is even better now" (9/27/16)
"Google is scary good" (7/31/17)
[h.t. Jichang Lulu]
Philip Taylor said,
March 10, 2018 @ 12:46 pm
An alternative version an English friend who speaks German fluently :
Laura Morland said,
March 10, 2018 @ 12:50 pm
I almost agree with you about the 2nd sentence being "superior," although I would concur with your evident decision that "a lot" is too colloquial.
And why, I wonder, did GT not see fit to translate "ganzes"?
milu said,
March 10, 2018 @ 12:56 pm
i don't follow GT's tech news closely, but i recently (a month or two ago?) noticed a really impressive uptick in Greek > English translation quality. i don't do enough EN > GR (the reverse pair) to figure out whether the underlying engine has been improved or only the English "grammaticality" module, if there's such a thing (well surely there is).
Lukas said,
March 10, 2018 @ 1:51 pm
Deepl.com, a deep-learning based translator, turns it into:
"The fact that German football is acting in the name of a dictatorship frightens me. The Chinese influence on the economy has grown considerably, including our whole lives. Most Europeans do not even register this."
I think "frighten" is closer to the intention of the German original, and the second sentence lands somewhere between Google and the cleaned-up version.
Abbey Road said,
March 10, 2018 @ 3:27 pm
I particularly admire the “tight terseness”, not to mention the concisión, of the second sentence of paragraph 7: “In a way, you could they are bordering on writing prose…”
Inge Noeninger said,
March 10, 2018 @ 3:59 pm
"… GT's ability to capture this intrinsic spirit of conciseness that gave me such a thrill when I saw what it did in English with the German original."
Let's not forget that these are machines, GT and DeepL are. So the terminology to describe what they deliver makes a difference to me. They are not capable of anything, have no ability whatsoever but are instead fed big data non-stop. Their intricate networks are built on databases. They deliver an output and that's it. They don't think and they are never in a foul mood :) DeepL is based on Linguee and the latter has collected data with their crawlers for years or decades. So, it's all output, some good, some lousy.
Victor Mair said,
March 10, 2018 @ 4:09 pm
@Inge Noeninger
"They are not capable of anything, have no ability whatsoever…."
I beg to differ. I think they are highly capable and have a tremendous amount of ability.
Remember Hal?
Inge Noeninger said,
March 10, 2018 @ 4:17 pm
What I am trying to say is that being capable and having an ability is something I ascribe to humans, not machines. The fact that they are very powerful and impressive, like IBM's Watson for example, is not disputed. I am just not going so far as to attach purely human qualities to machines. They will become even more powerful now that they are driven by artificial intelligence, no doubt about that. And they will disrupt work like we have never seen before. And yet, they have no human traits. They are machines. That's all I am saying.
Ben Hemmens said,
March 10, 2018 @ 4:43 pm
I think the middle sentence works because it has a structure that's a little bit un-German (pulling the "ist stark gewachsen" into the middle instead of leaving it at the end), and happens to suit both our end-weighted way of saying things and the general preference of English for not stuffing too much stuff in between the main S and V and O.
Philip Taylor said,
March 10, 2018 @ 5:34 pm
Inge : Is there any incontrovertible evidence that everything that you utter, and everything that I utter, is not also "just output" ?
Inge Noeninger said,
March 10, 2018 @ 5:50 pm
My point is about anthropomorphizing machines. https://www.merriam-webster.com/dictionary/anthropomorphize
Guy said,
March 10, 2018 @ 6:11 pm
If I said a knife has the ability to cut meat, am I anthropomorphizing it? Even setting aside philosophical questions, I just don’t see as a linguistic matter how saying a machine has an ability anthropomorphizes it.
Philip Taylor said,
March 10, 2018 @ 6:44 pm
Guy: Given the second definition of ability in the OED, I would agree with you :
(Note: stress added to the salient point of the quotation).
Tom davidson said,
March 10, 2018 @ 8:07 pm
GT still has a way to go when translating TCM and other scientific research papers from Chinese to English….
Colin said,
March 10, 2018 @ 9:37 pm
A popular line among mathematicians and physicists (versions have been attributed to various people) is "if you can't explain it simply, you don't understand it well enough". That's not an aesthetic judgement, but rather a heuristic for how successful you will be at applying or adapting what you have done to a new context (let alone for someone else to read your work and do the same). A clunky solution to a problem is unsatisfying because there's never just one problem, but rather a whole field of enquiry that you hope to progress by solving the given problem.
ktschwarz said,
March 10, 2018 @ 11:44 pm
I agree with Ben Hemmens: this is a relatively easy translation because the word order was already close to English. GT did move some words, but not very far: e.g., the verb "handelt" was moved from the end of the clause to the middle, a distance of only four words. I think it should have moved some more: "That German football acts…" sounds stilted. What happens if I make it worse by piling on more words?
Google responds:
OK, now I'm impressed: GT moved "scary" all the way to the front and correctly introduced the dummy "it"—although it also changed "scares me" to "it's scary" for no reason.
Here's another impressive one: German
to English:
GT changed the whole structure of the sentence! Instead of two parallel main verbs—"threw" and "caught"—now it has "liked" as a main verb, and two parallel subordinates. And it comes out quite natural.
Victor Mair said,
March 10, 2018 @ 11:47 pm
@ktschwarz
All the more elegant! Impressive demonstration!
Aristotle Pagaltzis said,
March 11, 2018 @ 12:11 am
The translation of Philip Taylor’s friend captures the “influence of the Chinese” distinction that is clearly present in the original but lost in the other translations. But otherwise I find it worse than the other translations in that it’s simply too wordy.
I find DeepL’s translation best. It’s impressive that it picks “frightens” for the first sentence and also puts the “even” in the last sentence that’s missing from the other translations. I also find its choice of “considerably” more accurate than Mair’s “greatly”, though both choices miss the suggestion that there has been a robust change in rate of growth.
The second sentence is especially tricky because it’s structured like speech, and translations sound almost entirely too literary given the source.
As someone who speaks German more fluently than English, I would offer something like this:
“The fact that German football acts in the name of a dictatorship frightens me. The influence the Chinese have on the economy has grown heavily, on our whole lives too. Most Europeans aren’t even registering that.”
Coby Lubliner said,
March 11, 2018 @ 1:51 am
Have none of the translation machines caught that, for North Americans and Australians, "football" should be replaced with "soccer"?
Hungarian said,
March 11, 2018 @ 3:38 am
I agree with Tom Davidson. It's truly impressive what google translate can do between languages like German and English and I use it often to try to get the gist of what is being said in languages I need help with, and for inspiration when I'm doing my own translations, but I primarily translate between Hungarian and English and there it is having a lot of problems. I use it to jog my memory about what my options are for a translation but then I often go my own way.
Just yesterday I had hoped google translate could translate a recipe well enough for a non Hungarian speaking friend that I could be lazy but it mangled it entirely.
It knows that marhalábszár is beef leg, but if you put "60 dkg marhalábszár" in suddenly it spits out "60 dkg beetle bark" when it should say "600g beef leg". It translates "1 ek fűszerpaprika (csapott, édesnemes)" as "1 tsp seasoned pepper (battered, sweetheart)" instead of "1 Tbsp paprika (levelled, non spicy).
It translates "A zsírt felhevítjük, megfonnyasztjuk benne a hagymát, majd félrehúzzuk, a fűszerpaprikákkal összekeverjük, felöntjük 1,5 dl vízzel, majd a vizet elfőzve, zsírjára pirítjuk." as "Grate the grease, squeeze the onion, then squeeze it, mix it with the pepper paprika, pour it with 1.5 dl of water, then fry the water and fry it to fat."
But it means "melt the lard, sauté the onions in it, take it off the stove and mix it with the paprika. Pour 150 ml of water into it, and then fry it until all the water has boiled off." There wasn't a single sentence in the entire recipe that could be understood without help, although the recipe didn't use any unusual vocabulary or constructions or ingredients; it was all very standard.
Another truly bizarre answer was "1 teáskanál őrölt fűszerkömény" which it thought meant "1 teaspoon ground spinach" but it should be caraway, not spinach. Spinach is spenót. I can't imagine how it got the two confused. If you ask it "fűszerkömény" by itself then it knows it's caraway.
Well at least my friend and I had a good laugh.
Robert Coren said,
March 11, 2018 @ 10:23 am
A slight digression on "elegance": Not only mathematicians and programmers, but chess players use this term too, all in the sense of achieving the maximum result with minimal means/effort. I claim that it can apply to music as well, with Mozart as the nonpareil master of elegance. Back in the dim and distant past when I worked as a programmer, I posted on a corkboard above my desk a transcription of the return to the recapitulation of the finale of the "Jupiter" symphony, as an example of what we were all striving for.
(Apologies to those for whom that last sentence is entirely opaque.)
Dr. Decay said,
March 11, 2018 @ 12:15 pm
@Robert Coren. More on elegance. Einstein, in his preface to "Über die spezielle und die allegmeine Relativitätstheorie" wrote: "Ich hielt mich gewissenhaft an die Vorschrift des genialen Theoretikers L. Boltzmann, 'man solle die Eleganz Sache der Schneider und Schuster sein lassen.'" Google's translation of Boltmann's dictum is "The elegance should be left to the tailors and shoemakers." Not so elegant, but pretty good if you drop the articles.
ktschwarz said,
March 11, 2018 @ 7:59 pm
As an American, I have no objection to "German football", especially in a direct quote at the end of a story about it. Does GT distinguish American/British English at all?
Another issue: "in the name of a dictatorship" should be "on behalf of a dictatorship". In English, "in the name" would have to mean overtly using the name, but the whole point of the paragraph is that the pressure was covert and not recognized. To get that right, the translator would need more than one sentence of context, and significant real-world knowledge.
@Hungarian: Great examples. The burning question to me is: Does it perform worse on Hungarian than German because the Hungarian-English corpus is smaller, or because Hungarian-English is inherently harder? Given the same corpus for all three languages, would the translations be equal quality or not?
Kiwanda said,
March 11, 2018 @ 8:53 pm
"…English has a genius for concision….intrinsic spirit of conciseness…"
I prefer "concision" to "conciseness", for no clear reason; maybe it's that concision should itself expressed with concision.
My vague impression is that "maliciousness" is taking over from "malice", "voraciousness" for "voracity", "ferociousness" for "ferocity". I wonder if it's really so.
PRW said,
March 12, 2018 @ 1:02 am
It renders the incipit of Catullus 16 thus: 'Pedicabo ego vos et irrumabo' -> 'Bugger you and stuff'. This is at least vaguely on-point, more so than the last time I checked a few years ago, but i'd upgrade Google Translate for Latin from 'laughable' to 'lame' and no more.
Ursa Major said,
March 12, 2018 @ 8:55 am
@PRW: Google Translate's attempt to go the other way with Philip Larkin's famous line is "Vos et irrumabo tuus mum quod dad." The second half of the sentence is clearly wrong, I checked random other languages and it correctly used native words although sometimes too formal (e.g. ta mère et ton père). I don't know enough (any) Latin to evaluate the first half, but I would guess that irrumabo comes directly from feeding the database with translations of Catullus – reading the Wikipedia page on Catullus 16 suggests that it is incorrect.
Rodger C said,
March 12, 2018 @ 11:28 am
That should of course be "Bugger you and stuff you." ;)
BZ said,
March 12, 2018 @ 12:26 pm
Wouldn't "the German football" as opposed to "German football" have to be interpreted as referring to a ball and not the game? If so, that's a pretty serious mistake for the software to make. Of course that's something a reader can probably interpret correctly in context. On the other hand I don't know what to do with "including our lives" without explanation.
Ben Hemmens said,
March 12, 2018 @ 6:30 pm
Aristotle, the "The fact that … " version is the most obvious ploy I’d go to for a sentence beginning with a ‘Dass’ like that. But the ‘That … ’ version really packs a punch. I think it repays the strain.
Aristotle Pagaltzis said,
March 12, 2018 @ 7:42 pm
Ben Hemmens:
I went with “The fact that” for… irrational reasons. I felt the difference between “in Namen von” and “in the name of” that ktschwarz pointed out, but it not catch my conscious attention to the point of reconsidering… instead I just picked up the longer version from the other comments as feeling somehow less wrong. Now that I’ve duly facepalmed upon reading ktschwarz’s comment, I’ll agree that the shorter version is preferable.
So the first sentence should be: “That German football is acting on behalf of a dictatorship frightens me.”
Hungarian said,
March 13, 2018 @ 3:58 am
I think it is both. Google translate has been getting better, but it still has trouble parsing Hungarian grammar or producing grammatically correct translations between the two languages, and I suspect that has to do with Hungarian not being an indo-european language and not following whatever unspoken rules of language the programmers were envisioning when they trained the software.
But in the recipe I was looking at the other day, the most startling problems were vocabulary ones that I couldn't begin to understand where it was getting the words is suggested — I got distracted trying to find out what beetle bark even is, wikipedia didn't know, though I learned about bark beetles instead. I don't understand how it knows that caraway is not spinach in isolation but mixes them up when you give it a sentence or paragraph.
There were also minor problems like not understanding the weights and measures used – mixing up the teaspoon and tablespoon, not knowing that 60 dekagrams should be translated into 600 grams because it won't be understood by an english speaker, thinking 60 dkg could be translated as 60 oz which makes a kind of sense (oz, dkg, g are all words that could plausibly follow that number but they aren't interchangable!)
I ask it "madártej" by itself and it knows I mean "Floating Islands", the french dessert, but if I give it an entire recipe for madártej then suddenly it's translating it literally as "bird milk."
ktschwarz said,
March 14, 2018 @ 2:52 am
I couldn't begin to understand where it was getting the words is suggested
I have a guess! Google Translate breaks down its input into "wordpieces", such as:
It's these wordpieces that are encoded into the internal state of the neural network, then decoded into wordpieces of the target language. I think the crazy words are coming from the decoding step. Your Hungarian input has compound words such as "marhalábszár" = "marha" (beef) + "lábszár" (shin). Instead of going from "marha" to "beef", it only got as far as "_bee" but completed it as "beetle" instead. That's due entirely to English spelling.
This also explains what happened to "fűszerkömény" = "fűszer" (spice) + "kömény" (caraway). It got from "fűszer" to "_spi" but completed it as "spinach".
it knows that caraway is not spinach in isolation but mixes them up when you give it a sentence or paragraph.
Possibly it's using the old, non-deep-learning, statistical translation for single words. I also suspect that's the "alternate translation" when you click on the output (desktop interface only).
Graeme said,
March 14, 2018 @ 6:42 pm
Is there something about the grammatical rigour of German that makes it potentially machine-compliant? As long, that is, as you take the syntax into account. Earlier versions of Google translate just got lost in the literal word meanings, assuming they were working on a svo language. Once you start to program datives before the verb, then magic can happen
Jimbino said,
March 14, 2018 @ 7:45 pm
I tried GT on "Die Männer die vor dem Schokoladenladenladen Laden laden laden Ladenmädchen zum tanzen ein," which it translated almost right. It only missed the plural "Laden" => "drawers."
Alex said,
March 14, 2018 @ 11:42 pm
A team of Microsoft researchers announced on Wednesday they've created the first machine translation system that's capable of translating news articles from Chinese to English with the same accuracy as a person
https://www.yahoo.com/news/microsoft-announces-breakthrough-chinese-english-144614336.html
Alex said,
March 14, 2018 @ 11:45 pm
Considering Microsoft isnt banned here perhaps they more daily data than google to work with.
Bruce Humes said,
March 16, 2018 @ 5:27 pm
I've noticed that people tend to have strong opinions, one way or the other, about GoogleTranslate. Especially professional translators, who seem to be more than a bit worried about losing their day jobs as a result of its existence and rapid improvement due to "self-learning."
Personally, I use GoogleTranslate frequently for Chinese-English translation and find it useful, even for literary translation; I can't use it as a first draft, but it is useful as a reference. Only a fool would assume that the output is likely to be highly accurate or grammatical, however.
But there is something very basic about GoogleTranslate that one should bear in mind: Each translation it generates draws upon the existing pool of text generated by native speakers + translated text.
One could argue that the example of Hungarian above doesn't tell us much about the "ability" of GoogleTranslate overall. After all, compared to global languages like Chinese, English or Spanish, the amount of existing Hungarian text currently on the Internet must be miniscule by comparison. That means many terms and wordings may not even be up on the web yet in Hungarian.
In this respect, it might be interesting to consider the impact of the huge quantity of Chinese-English translated text on the Internet worldwide today. Some of the commercial translation — generated for marketing purposes, for instance, like press releases — that one finds on the Internet contains more than a tad of Chinglish. So I wonder: How good is GoogleTranslate at distinguishing English from Chinglish? Isn't it likely that GoogleTranslate will perpetuate this situation by frequently generating Chinglish as a result?
Ultimately, GoogleTranslate might "learn" to handle Hungarian-English translation better than Chinese-English, if the existing pool of translated text it consults was input by highly bilingual Hungarian-English translators.
Hungarian said,
March 17, 2018 @ 12:53 am
ktschwartz that makes a ton of sense. Thank you for solving the mystery for me! I think that deep learning model where it makes guesses based on the first three letters of the word is incompatible with agglutinative languages, after all if a hungarian word starts with "meg"* it could be almost any verb. I noticed that earlier versions of google translate just treated Hungarian word endings as interchangeable which resulted in sentences with seemingly randomised pronouns and tenses. It also had a lot of trouble recognising negation. It's getting better, slowly.
* meg is a verbal prefix that can add a perfective aspect to the verb or indicate the beginning of an action or a few other things.
marhalábszár can be broken down even further than that: láb = leg, szár = stem.
The problem with going from fűszerkömény to "spi…" and picking spinach is that a lot of spices have fűszer (spice) at the start of them to distinguish them from the entire plant or the fruit – fűszerpaprika (the red powder) vs paprika (pepper/capsicum fruit). In English maybe you can guess what the word is going to be by the beginning but a Hungarian word might turn into 5 or 6 English words and you have to pay attention to all of the morphemes if you want to find out which English words to use.
German has a lot of compound words too but it's not agglutinative and it has a bigger corpus, maybe that's the difference.
Anyway, thanks for solving the mystery! That's really cool.
Jeff DeMarco said,
March 17, 2018 @ 11:26 am
I don't think FB uses GT (though not sure) but I was impressed when the Italian "vafa…" was translated as "effin…."