I'm learning… something?

« previous post | next post »

Google Translate renders "Tanulok Magyarul" as "I'm learning English":


This is presumably related to the reasons that Google Translate used to think that Australia is Ireland, and that Darwin is Freud

Nice to see that there's still some headroom…

Bing Translator get it right:

[h/t Greg Alger]



24 Comments

  1. Guy said,

    April 7, 2016 @ 12:45 pm

    In translations, the name of the language will sometimes best be translated to the name of the source language and sometimes to the name of the target language (especially in some forms of fiction, where "translation" often includes other forms of localization). This seems especially so in the case of literature classes, which are often called "English" classes in English, and likewise in other languages, even though I would argue that that's not really a good label. My understanding is that google translate works based on algorithms that look at human-conducted translations, so it's not too surprising that it would give this result.

  2. FM said,

    April 7, 2016 @ 1:43 pm

    Weird capitalization often trips up these statistical algorithms for some reason. Magyarul would not normally be capitalized in Hungarian, therefore…

  3. Ellen K. said,

    April 7, 2016 @ 1:52 pm

    After reading FM's comment and then experimenting, it seems if you both make magyarul lower case and also add a period at the end, you get "I learn Hungarian." (Gets the language right, and messes up the grammar.) Doing only one of those two things gets the "I'm learning English" translation though.

  4. BZ said,

    April 7, 2016 @ 2:15 pm

    @Guy,
    It seems to me that translating English class to whatever the target language is in fiction only makes sense if you are not just translating, but adapting the work, i.e. changing the setting to the target country. In such a case it is likely that a lot of other major changes are done, and Google shouldn't even recognize this as a "translation".

    Even if I'm wrong, you're not *learning* English if it's your native language, you're *studying* it. I don't know if the same distinction is made in Portuguese, but either way this collocation should be problematic.

    On the other hand, when referring to a *foreign* language (especially making fun of it) in fiction, and translating to *that* language, you might want to substitute something else. For example, in the French version of Futurama, all references to French no longer existing and France being mentioned in a negative way, are changed to German and Germany in the French dub.

    Though clearly that's not the case here because you're talking about the *same* language as the one being translated.

  5. Lazar said,

    April 7, 2016 @ 2:17 pm

    @Guy: I remember that in the English dub of Ghost in the Shell, one character mentions that an immigrant "couldn't speak English". Wouldn't it be more relevant whether he could speak Japanese, since all the Japanese signage present in the film indicates that they're in some future version of Japan?

  6. Adrian Bailey said,

    April 7, 2016 @ 2:17 pm

    Tanulok magyarul az iskolában. and Tanulok magyarul a munkámhoz. give similar errors.

  7. rpsms said,

    April 7, 2016 @ 3:37 pm

    @lazar: it may blow your mind to know that GITS was "set" in Hong Kong. They don't actually name the location, but the streets and design of everything was based on Hong Kong locations.

    So, it isn't that cut and dry.

  8. Guy said,

    April 7, 2016 @ 5:50 pm

    @BZ

    I didn't mean to suggest that this was how it should be translated by Google translate (obviously it shouldn't be), I was only suggesting the first explanation I could think of for how it might have happened.

  9. Fred said,

    April 8, 2016 @ 4:55 am

    That reminds me: in my old Dutch passport the sentence 'Page reserved for the issuing authorities' (i.e. for the Dutch authorities), which is translated into all EU languages on the last pages, was translated into Hungarian as 'Magyar hatóságok bejegyzései', i.e. 'Entries by the Hungarian authorities'. You wouldn't expect such a sloppy mistake in a passport…

  10. January First-of-May said,

    April 8, 2016 @ 5:14 am

    On the Darwin/Freud thread, which is now closed:

    The Google synonyms thing really hurts me in my (very amateur) genealogy research, where the relevant first or last names are often similar to (misspelled) synonyms of common words, and I can't put the combination in quotation marks because I can't be sure that the first and last name would be in that particular order (or even generally next to each other).

    More generally, I also often wind up with awful results when my search term happens to be abbreviated in a way that is also a more common abbreviation meaning a completely different thing (in which case Google gives me lots of useless results with the abbreviation, even if I wrote out the less common term fully).

  11. jaap said,

    April 8, 2016 @ 10:28 am

    @January First-of-May:
    You can put a single word in quotes if you want the exact spelling to be used by Google. Searching for example for progamming searches for the correctly spelled programming only, while searching for "progamming" searches only for the misspelled word.

    Note however that the reported number of results will be completely bogus, which is something that has been discussed here before.

  12. Roland said,

    April 8, 2016 @ 10:58 am

    "Adrian Bailey said,
    April 7, 2016 @ 2:17 pm

    Tanulok magyarul az iskolában. and Tanulok magyarul a munkámhoz. give similar errors."

    Tanulok magyarul az iskolában. = learning/studying > Hungarian language
    Magyarul tanulok az iskolában. Hungarian l. > studying/learning

  13. Nick said,

    April 8, 2016 @ 11:16 am

    once i clicked on the translate button on facebook for some status that was in spanish, if i remember correctly, and fb "translated" 'metros' to 'yards' … but … meters aren't yards

  14. JS said,

    April 8, 2016 @ 11:32 am

    @jaap

    It seems to me that Google Search has gradually begun to trust its algorithms more than its users, very bad news for those of us interested in finding examples of language of a very particular and often non-standard kind.

    For example, searching for "programming in basic" in quotes and with "verbatim" selected in search options _still_ gives me a bunch of results in which this specific phrase does not occur in the page, including, apparently, the first three. This never used to happen.

    For lack of a better term, this sucks.

  15. Yvon Henel said,

    April 8, 2016 @ 1:16 pm

    I've just tried google translate from Hungarian
    “Tanulok magyarul az iskolában.”
    to French and obtained
    “J'apprends l'anglais à l'école.”
    So it seems English is the only language worth learning ;)

  16. Brett said,

    April 8, 2016 @ 4:59 pm

    @JS: It has been effectively impossible to do verbatim Google searches for some years now. I believe that the fact was brought to my attention by the comments on a Language Log post, actually.

  17. JS said,

    April 8, 2016 @ 6:13 pm

    Thank you; in retrospective I suppose I've been in denial.

  18. Martin van den Berg said,

    April 8, 2016 @ 7:04 pm

    This problem seems to be rather persistent. I noticed something similar in 2010 with "Magyarország" being translated as "India" (as witnessed on my Facebook photo timeline). Probably the parallel corpora with Hungarian are rather sparse. Still, 6 years is a long time.

  19. Rebecca said,

    April 8, 2016 @ 8:08 pm

    I had first noticed this last fall, shortly after the Paris bombing. There was a widely-shared video of a French father explaining things to his young son, captured by a French journalist. I was reading about it on the journals Facebook page, and the father responded to commenters. The translation had this same kind of puzzling error:

    "Angel Le Bonsoir je suis le papa avec le petit garcon, merci à vous tous pour les super commentaires que vous avez poster pour nous
    C'est quand je vois tout ces soutiens que je me dit une chose: je suis fiere d'être francais et fiere de mes compatriotes!!! Ma famille et moi vous embrasse
    Hello I'm the dad with the little boy, thank you all for the great comments that you post for us
    This is when I see all these support that I tells me one thing: I am proud to be English and proud of my countrymen!!! Me and my family kiss you"

    https://www.facebook.com/PetitJournalYannBarthes/videos/vb.122898471086693/1013093998733798/?type=2&theater

  20. JS said,

    April 8, 2016 @ 8:32 pm

    In "This sentence is in English," "English" ought to become the name of the target language in translation…
    Maybe Google encounters similar situations in the form of series of texts in multiple languages all translated from a single source and labeled by language name, causing it to conclude that the names of all languages are often to be translated into English as "English"?

  21. Adrian Bailey said,

    April 8, 2016 @ 8:41 pm

    Köszönöm Rolandnak.

  22. Michael Watts said,

    April 9, 2016 @ 4:10 am

    JS:

    Instruction manuals frequently have the same text (you know, the instructions) repeated in several different languages. The manuals I've looked at generally use a two-letter language code rather than the full name of the language, but it is an example of parallel texts in which the name of the language each text is written in gets "translated" as a self-reference rather than a fixed reference.

  23. A. said,

    April 9, 2016 @ 3:51 pm

    @Brett: You can do verbatim Google searches by putting intext: before the word.

  24. L said,

    April 13, 2016 @ 3:35 pm

    "once i clicked on the translate button on facebook for some status that was in spanish, if i remember correctly, and fb "translated" 'metros' to 'yards' … but … meters aren't yards"

    Aren't they? Sort of? Sometimes?

    Certainly it would be wrong to translate "3,6 metros" to "3.6 yards." But would it be wrong to translate "A pocos metros" to "a few yards"? "Cerca de 100 metros" to "About 100 yards"? I'm honestly not sure.

RSS feed for comments on this post