The standard deduckling

« previous post | next post »

Today's xkcd:


Mouseover title: ""I ended up getting my tax return prepared at a local place by a really friendly pretrained neural net named Greg."

But too-limited training isn't the only problem — see e.g. "Shelties On Alki Story Forest", 11/26/2019, or "o ai aaa oa ueui", 2/27/2018.

As an indication that the last example hasn't changed much in the past two years, this morning, Google Translate recognizes

aeiaeeuaiuioeae ie i oo eiaoeii ouoeiooiieuuaaeaaaeaeau ooieoiaueuoo e o eiaiu oeoe i oo i oeeuiuu ooo eouaaeaaeau ueieaeooueu ouui eeeaooaoeeee

as Hawaiian, and translates it as

This can be used as a function of the values ​​of the data that are in the ooo and the values ​​of the values ​​of the data and the values ​​of the data.

The following random sequence

u uiiou iueoiaoooea u eoioaeeoiuu iooiio oee oiaouio auueuai iiaueuoa oeeioou oiueeeioou oauauaeaa ooeoua eao aaooouooueouo

is translated as

Here are the languages ​​that are listed in the current directory where you can write an inline language or, in English, in the main.

Another random string of vowels and spaces

iieoaiiiaueooouueuei oeaeuaaeuu ieeu oioui oeu uuuaoaoea i eo o eiee a iie u eiiuaaoaoae ioaeuauou oi aoeouaeeu ee ou

comes out as

Built-in-storage-shelf-shelf-wall is located on the shelf and can be used as a source of data and

Trying again,

aoaeuouiiooiaai ieu uoi aa uoauuuua i iio uoaeeeeaeiiioeeu uaaoaoe iiio oouoiieaaii eo oei e ooeeu a auaoueeoe aia oeaoooeae ao ee aaieouia ae e iuo

yields

This is the main feature in the library of important data that can be used in both the source and operating environment.

It's interesting that GT's idea of default Hawaiian text is so Data Science-y these days, though other topics and genres sometimes creep in:

euiueoeiiiui ioiaiieaouoouaie ai ei oe  oaieue ao i ai  ooie  iiouo oii aua ui iueu oo oue e ea oee

It is important to have a simple life while working here for the majority of the lives because of the hardness of your breath

If you want to try this yourself, you can run this 2-line R script (or the equivalent in your favorite language) to produce new examples (changing N as the fancy takes you):

N = 150; Letters = c("a","e","i","o","u"," ")
cat(sprintf("%s\n",paste0(sample(Letters,N,replace=TRUE),collapse="")))

 



5 Comments

  1. Thaomas said,

    February 8, 2020 @ 11:43 am

    I get why it flunks this stuff, but why can't it translate Latin worth a damn? Can't be lack of training texts.

    [(myl) Actually I suspect it's exactly the lack of adequate training texts. Extant classical Latin amounts to about 5 million words; available church Latin adds a few tens of millions of additional words. Serious MT training sets (parallel or comparable) are much larger than that.]

  2. Garrett Wollman said,

    February 8, 2020 @ 2:37 pm

    Randall's effort may well have been inspired by Janelle Shane's recent efforts at NN-generated classic 1970s Jell-o salad recipes: https://twitter.com/JanelleCShane/status/1225826027475128321?s=09 — some of which are just surreal but others are nightmare fuel.

  3. ktschwarz said,

    February 8, 2020 @ 2:48 pm

    Many of the elephant semifics from previous posts have been tamed. Repetitions of Japanese or Thai syllables now produce mostly just boring repetitions of syllables or one English word, and long strings of Vietnamese vowels now produce mostly just boring strings stripped of diacritics. (There are a few exceptions, have fun finding them.) Tragically, The sphere of the sphere is the sphere of the sphere has been reduced to "Go back, go back, go back, talk more"; and the beautiful Frisian poetry "From the abyss of the abyss, the abnormality of the abyss…" has decayed into almost nothing, alas!

    I presume that Google has put resources into Japanese and Hawaiian proportional to their commercial applications. For all we know, maybe too-limited training really is the only difference; Google isn't saying.

    Thaomas: GT doesn't have Neural Machine Translation for Latin, so you can expect its English output to be less fluent than for most languages.

  4. Julian said,

    February 8, 2020 @ 6:33 pm

    I tried to make fun of Google Translate by asking it to translate from Hawaiian a string like the above. As the string grew it offered a translation; I added to the string; it changed its mind. The final string was:
    u iioouu ieouuii ui iouuu euuiieeuui iueii eeeeiou iiiuo

    This gave the following outputs. My reactions are in square brackets.

    I like you very much. [Why thank you.]
    I have a few questions. [Isn't this a bit quick? Maybe it's a cultural thing.]
    I have a good copy. [Is this a come-on?]
    It is important to know the domain that can be used. [I sense that you have a fairly serious personality, and are probably something in mathematics.]
    It is important to know that I have to write a message. [Getting a bit meta here.]
    It is important to understand that this is a very useful tool. [Okay! I'm sorry I tried to make fun of you! I'll stop now.]

    On another try, the details of which I sadly did not record for posterity, it got hung up for quite a long time on 'The data cannot be changed.'

    Maybe there really is a little man in the machine, and he's making fun of us.

  5. unekdoud said,

    February 8, 2020 @ 8:27 pm

    "It is important to have a simple life while working here for the majority of the lives because of the hardness of your breath"

    Now I feel attacked by GT.

RSS feed for comments on this post