Your gigantic crocodile!
« previous post | next post »
One more piece of Google Translate poetry, contributed by Mackenzie Morris:
I've been collecting screenshots of the best of these, because I'm assuming that soon, someone at Google will notice the hilarity and take the trivial steps required to stop it.
It's easy to distinguish keyboard banging or nonsensical repetitions from sincere input — unless it's now a matter of religious faith at Deep Mind that with proper training, Tensorflow speaking ex reticulo is infallible…
Update 5/8/2017: Someone at Google Translate noticed, and (probably) introduced some of the simple protections that I suggested, so that many of these tricks are now much more boring…
Gwen Katz said,
April 28, 2017 @ 10:01 pm
I wonder if that second one was caused by an ad template that had some nonsense symbols filling the space.
Y said,
April 28, 2017 @ 10:32 pm
Its precedents:
ççcleo!
çççcleo!
ççççcleo!
çççççcleo!
Fuckle!
Cheat up!
Chew up!
Cheer up!
[(myl) And a longer sequence, duplicates removed:
]
Thomas Rees said,
April 28, 2017 @ 11:21 pm
72 łs (lower case el with stroke, detected as Polish, unsurprisingly) give “Thefts LL”. This has to be meaningful.
Michael Watts said,
April 28, 2017 @ 11:22 pm
It's easy to distinguish keyboard banging or nonsensical repetitions from sincere input
Under which category would Jabberwocky and "Lorem Ipsum" fall? Both are readily recognizable (to a human) as being untranslatable nonsense which nonetheless belongs to an identifiable language.
[(myl) The most obvious test (e.g. for Greekness or Turkishness or Englishness or etc.) would be character ngram probabilities — which both of your examples would pass.
Another test would be word probability — unigram probability, to start with — for those orthographies where "words" are signaled in some way. "Lorem Ipsum" would fail that test (as English or Greek or Turkish or anything but Latin), unless its words had been uncritically accepted into the relevant list. Jabberwocky might pass or fail, depending on the threshold used and the source of the wordlist.]
Ross Presser said,
April 29, 2017 @ 12:00 am
ççç!!
çççç!!!
çççç!!!!
çççççççç!!!!!!!!
Do it!
Cheat !!!
Embezzle !!!!
Cheerful !!!!!!!!
Gregory Kusnick said,
April 29, 2017 @ 12:49 am
Jabberwocky may be nonsense, but untranslatable? Apparently not.
tangent said,
April 29, 2017 @ 1:33 am
Why would someone be interested in stopping it? Is it causing a problem?
What are the trivial steps to stop it anyway? Just matching on the lexical pattern of character repetition, sure, that's easy, but it doesn't stop the whole 'problem'. The larger issue is that the model hasn't been trained on much nonsense and it doesn't know how to shrug. I guess that could be of value in some applications. But this is more fun.
[(myl) I don't know what Alphabet's attitude is towards innocent merriment at their expense — it would be nice to think that they're tolerant of such things, even if epithets like "Goofle Translate" go viral.
But there's a more serious potential reputational damage. AI-oid solutions are increasingly being promoted as solutions to practical problems, from self-driving cars to organizational management. Many of these applications involve situations where the consequence of bizarre reactions to unexpected inputs might be genuinely catastrophic. And some people's reaction to this sort of thing is like Jonathan Smith's: "So what Keith M Ellis said. Not getting in the self-driving car either…"
See e.g. Will Knight, "Intelligent Machines: The Dark Secret at the Heart of AI. No one really knows how the most advanced algorithms do what they do. That could be a problem.", MIT Technology Review, 4/11/2017.]
Michael Watts said,
April 29, 2017 @ 1:57 am
I was aware of the foreign-language Jabberwockies.
"Translating" the nonexistent English word "brillig" into the equally nonexistent German word "Brillig" is not translating in any meaningful sense. Rendering "slithy" as "schlichte" is a little more interesting, but it's an exercise in phonology, not translation — a translation would usually care more about equivalence of meaning. You can't translate what doesn't have meaning in the first place.
AntC said,
April 29, 2017 @ 2:30 am
@Michael Watts Under which category would Jabberwocky and "Lorem Ipsum" fall?
Google translate detects Jabberwocky (first two lines) as English quite easily. As well it should: plenty of English words there; and even though some of them might not be in your dictionary, they fit English spelling.
You can't translate what doesn't have meaning in the first place.
*gasp* What crashing boorishness! Every word there has meaning, as subsequently explained by Humpty Dumpty. Several of the words Carroll used thinking they he was inventing them turned out to exist already (obsc/arch). And plenty of the words he invented have subsequently gained exactly the meaning he gave them.
Sir, you have tin ears. I picture you whiffling through the tulgey wood and burbling as you come.
By the way, the translation into French has exactly caught the aesthetic of Symbolisme. (Google translate's efforts come nowhere near it.)
Gregory Kusnick said,
April 29, 2017 @ 2:43 am
Michael: Isn't the point that none of it has meaning to Google Translate? It's just mindlessly manipulating symbols, without caring about meaning. And yet it does an acceptable job of translation most of the time.
It's a clear case of what Daniel Dennett calls "competence without comprehension", analogous to the genetically programmed behavior of insects, and similarly prone to failure when confronted with stimuli outside its domain of training.
Michael Watts said,
April 29, 2017 @ 2:57 am
Yes, Jabberwocky is easily identifiable as English, and Lorem Ipsum is easily identifiable as Latin. I said as much in my original comment.
The question was whether they would be categorized as "keyboard banging or nonsensical repetitions" (which they aren't, though they are nonsense) or as "sincere input" (which they also aren't). If you attempt to understand them, you will immediately notice that they consist in large part of words that could belong to the language in question, but don't.
As such, they are outside the universe of a translation engine. There is no symbol in any language that can map to or from the English word "frabjous" because "frabjous" is not an English word. Therefore all translation attempts, no matter the method they use, must fail.
Think of it from another angle. Der Jammerwoch purports to translate "brillig" as "Brillig". Why should I believe that that's correct? The German word for "carnation" is "Nelke"; it seems odd to expect that the word for "brillig" should resemble the English so closely. How do you know the German for "brillig" isn't "Schapptel"?
David Morris said,
April 29, 2017 @ 3:14 am
I tried with repeated single Korean letters. The most interesting results were:
ㅁ > ㅁ.
ㅁㅁ > Mike.
ㅁㅁㅁ > Klitschko.
ㅁㅁㅁㅁ > Whistler.
ㅁㅁㅁㅁㅁ > Klitschko.
ㅁㅁㅁㅁㅁㅁ > Whistler.
ㅁㅁㅁㅁㅁㅁㅁ > ㅁ Klitschko Klitschko.
ㅁㅁㅁㅁㅁㅁㅁㅁ > ㅁ Klitschko Klitschko.
ㅁㅁㅁㅁㅁㅁㅁㅁㅁ > ㅁ Klitschko Klitschko.
ㅁㅁㅁㅁㅁㅁㅁㅁㅁㅁ > ㅁ Klitschko Klitschko Klitschko.
Whistler could be anyone who whistles, the painter or the Canadian town. Klitschko seems to refer to the Ukrainian boxers Vitali and Wladimir Klitschko, but I can’t guess what the connection between them and the Korean letter ㅁ is. (The spell-checker on Pages for Mac doesn’t red-underline ‘Klitschko’, the one in the comment box here does (and also Wladimir).)
ㅊ > The.
ㅊㅊ > Cock.
ㅊㅊㅊ > Prohibited.
ㅊㅊㅊㅊ > Standard.
ㅊㅊㅊㅊㅊ > Prefix.
ㅊㅊㅊㅊㅊㅊ > Acknowledgments.
ㅊㅊㅊㅊㅊㅊㅊ > Ancestry.
ㅊㅊㅊㅊㅊㅊㅊㅊ > Prefix, prefix.
ㅊㅊㅊㅊㅊㅊㅊㅊㅊ > Pros and Cons.
ㅊㅊㅊㅊㅊㅊㅊㅊㅊㅊ > Pros and Cons.
My other notable finding was that different Korean keyboard layouts yield different results. The 2-set Korean layout (which I habitually use) maps ㅁ to the English a-key, and yields the results above. The 3-set Korean layout maps ᆷ to the English z-key and results in strings of ᆷs as the 'translation'.
unekdoud said,
April 29, 2017 @ 3:27 am
Google Translate also has an issue translating repeated digits: "12" repeated 60 times, translated from Turkish, is "128". Translations of digits from Japanese contain even weirder glitches.
If we all agree that numerals, especially 120-digit numbers, have the same meaning across most languages, you'd expect the algorithm to detect numbers and not touch them. But perhaps this is something you can't expect a neural network to figure out on its own without encountering the same long string of digits in several languages.
Christopher Henrich said,
April 29, 2017 @ 2:02 pm
These productions remind me of what happens when someone listens to a music video, with singing in an unfamiliar language, and tries to dope out the words. About ten years ago there was a brief fad for doing this and putting one's best guesses in subtitles on a copy of the video. Eventually it was suppressed as a copyright infringement, but a few of the results remain. Google "My loony bun is fine Benny Lava" for an example.
Let's not blame Google Translate; it's doing the best it can.
Harald Korneliussen said,
April 29, 2017 @ 2:30 pm
This doesn't use Tensorflow though, most likely. As far as I know Neural Machine Translation is not yet implemented for Turkish. It will be interesting to see if you can find similar examples for the NMT languages, I would think not.
A paper was published recently examining machine translation on problems selected for our human understanding of their challenges. NMT did better on all of them, except translating idiomatic expressions. There the old phrase-based machine translation was best.
I can believe it. I once gave Bing Translate a punny German news headline about tanning beds: "Wer rastet, der röstet", and it came up with "Rolling Stone gathers the toasts". It's not right, but it's kind of…inspired. It's as if it noticed something was going on that headline, and did its best to mix in an idiomatic expression/pun.
So the occasional strokes of brilliance you see from PBMT in translating puns or idiomatic expressions (which is probably related to what makes it readily confusable by nonsensical input), we will likely miss for a while!
Jerry Friedman said,
April 29, 2017 @ 11:31 pm
Michael Watts: Think of it from another angle. Der Jammerwoch purports to translate "brillig" as "Brillig". Why should I believe that that's correct?
Because the German version is as much fun to say as the English version?
Also, just as "brillig" is a blend of "broil" and "grill", "brillig" could be a blend of "braten" and "grillen", or so I imagine, not knowing any German.
The German word for "carnation" is "Nelke"; it seems odd to expect that the word for "brillig" should resemble the English so closely. How do you know the German for "brillig" isn't "Schapptel"?
Maybe that would work as well, for all I know.
A lot more has been said than I know in regard to translating poetry. Certainly translating "Jabberwocky" is a very different enterprise from translating a contract.
Smut Clyde said,
April 30, 2017 @ 12:07 am
Finnish to English:
https://3.bp.blogspot.com/-0L2t62JrGPo/WQUXNWFkzUI/AAAAAAAAVag/_JovwzeNGsgyuf_TYhBDyFRaD4kWmcj8wCLcB/s1600/cthulhu.png
Also — Hindi to English —
Do you know
How to do it
How to do it
Let's do some work
How To Do Some Functions
Do your workload also
Do a Coup to Do It
Do you have a knock on your knees too.
Door-to-Door Instructions
Do you also have a knock on the other hand.
The knife of the coaster too.
The key to the lamps, also do it.
Do It Yourself
Therapeutics
Therapeutics, therapeutics and also the rest. Also, let's also keep a loop.
The lasso, the other parts of the set up.
The brochure of the house
Do another in the franchise also lets you save more.
How to save a hold for your house
Let's also keep a loop of brochures, also a part of your work when you let go of
https://1.bp.blogspot.com/-VsxkW_WAck0/WQMEt5twL-I/AAAAAAAAVZM/9cFVTHizWdA9ckkUHZr-cOFcWIBC_i9jwCLcB/s1600/ko.png
I stole your idea here.
Smut Clyde said,
April 30, 2017 @ 12:12 am
Has anyone found any of this deep-dream poetry when translating into a language other than English? I'm getting the impression that the neural networks for translating into (say) Greek or Estonian are not sufficiently over-trained, so they do not produce these glitches.
[(myl) Repetitions of X, recognized as "Hawaiian", are translated into French
and Spanish
and German
etc. etc. similarly to English
]
Robert Coren said,
April 30, 2017 @ 9:46 am
I didn't look at any of the Jabberwocky translations at the link posted above, but I remember some of the French and German versions reproduced in Martin Gardner's The Annotated Alice, and I noted that, while the German tended to use existing German words for many of Carroll's invented words (die schlichte Toven / wirrten und wimmelten im Waben), the French went for nonsense-French that was reasonably analogous to the nonsense-English.
tangent said,
May 1, 2017 @ 3:24 am
@myl On the PR side, the burden of proof and persuasion to release an autonomous car is so high, nobody's going to look at it and say "but, what about Goofle Translate." If the car works, they'll get in.
[(myl) Probably true — but the PR question (and to some extent the public policy question) is how do you tell whether the "the car works"? In many ways, Google Translate "works" — I use it myself all the time. But it's news to me (and to most people) that it can go off the rails so spectacularly given certain sorts of inputs. And it will be a matter of concern that self-driving vehicle algorithms might similarly sometimes be triggered to behave in unexpectedly systematic but situation-inappropriate ways.]
On the technical side, you know the technologies have almost zero relation to each other, and not everybody knows that. But if people get it clear that machine translation is not currently "self-driving", that's good. And if this gives all opaque machine learning systems a reputation of needing careful engineering to prevent weird edge cases, that's good. People will overgeneralize on that, sure, but there's no permanent harm done from this overgeneralization. Doubt can be addressed by evidence. It's better than fear.
richardelguru said,
May 1, 2017 @ 7:28 am
çççççççççççççççççççççççççççççççççççççççççççç was recognized as Turkish and gave me 'Coulter'!
I wonder if the students at Berkley have been hacking???
[(myl) And successive additions of ç are translated as
Coulter
Foaming
Tongue-in-cheek
Tongue-in-the-wall crutches
Tongue-in-the-wall crusher
]
richardelguru said,
May 1, 2017 @ 7:31 am
Oh! and adding another ç produces 'Foaming'!!
Lane said,
May 1, 2017 @ 3:47 pm
For what it's worth, my understanding, confirmed by someone at Deepmind, is that Deepmind, while owned by Google, has nothing to do with Translate. Unless I'm missing something subtle in Mark's ex reticulo joke…
[(myl) I'm assuming that Deep Mind is the Rome of the Neural Catholic Church. And whatever the episcopate, the current orthodoxy is to view Feature Engineering as heresy…]
PB said,
May 3, 2017 @ 7:28 pm
@Michael Watts: the equally nonexistent German word "Brillig": The word doesn't exist in German, but (as a native Swiss-German speaker) I'd say it is a *possible* or thinkable word in German. The German word for glasses (spectacles) is "Brille", so a creative speaker, wishing to express that something has in some way qualities of glasses or is "like glasses", could call that "brillig". It would be a very whimsical, weird way of speaking, though.