Language Log

Hungarian trenching

September 29, 2018 @ 6:27 am · Filed by Mark Liberman under Lost in translation

From Adrian Bailey:

Although Google Translate isn't too bad now for the big 8 languages, the results for other languages can still be quite bizarre and/or disappointing. I used to do some Hungarian-English translation 15-20 years ago, and the machine translation available then hardly seems much worse…

Engedjetek meg nekem a tegezést. Angolként bajom van a magázással.

Google's translation: Let me do the trenching. I'm an English guy with shit.

Actual meaning: Let me tegez you (ie. use the informal forms for "you"). As an Englishman, I have trouble with the formal forms.

Screenshot:

Microsoft Bing's translation:

Let me do the Quat. In English, I have trouble with exaltation.

Baidu's translation:

Let me be my tegezést. As an Englishman, there is a problem with magázással.

In this case, Baidu seems to be most aware of what it doesn't know — though perhaps that's because it has less training material to feed its dreams?

September 29, 2018 @ 6:27 am · Filed by Mark Liberman under Lost in translation

Permalink

17 Comments

Adam F said,

September 29, 2018 @ 6:42 am

This reminds me of a famous Hungarian phrasebook….

[(myl)

https://www.youtube.com/watch?v=G6D1YI-41ao

]
DCBob said,

September 29, 2018 @ 7:19 am

Used Google recently to translate some Georgian and Armenian songs to English and it was genuinely terrible.
Vance Koven said,

September 29, 2018 @ 8:30 am

I'm in with Adrian Bailey. In trying to communicate with my wife's Hungarian relatives I have had perforce to rely on Google Translate. When I see what it does to their end of the conversation, I can only shake my head and wonder at what they're seeing from me.
Ambarish Sridharanarayanan said,

September 29, 2018 @ 9:13 am

So what are the Big 8 languages?
Fionnbharr Ó Duinnín said,

September 29, 2018 @ 10:21 am

First, there is a problem with the Hungarian above. The object of the sentence is direct, as such it needs to use the definite conjugation (Hungarian has two conjugations depending on the definite/indefinite nature of the object), which would be: "Engedjétek meg nekem a tegezést." The diacritic makes all the difference.

That aside, the speaker is already using the informal form, which is pretty rude. The second person plural form is being used instead of the third person plural, which doubles as the formal register. So, this should be: "Engedjék meg nekem a tegezést."

All that aside, this is such a ridiculously formal way of putting it, that this must have been taken from one of those pre-WWII phrasebooks. Moreover, it uses pretty complex grammatical phrasing. If s/he really did have problems with switching formal register, you wouldn't know it from this–apart from the fact s/he uses the informal second person conjugation. If s/he really did have a problem with Hungarian forms, s/he would use simpler forms. Perhaps:

Ugye nem baj, ha tegeződünk?
Jobb lesz, ha tegeződünk.
Könnyebb, ha tegeződünk.
Or, most simply: Tegeződhetünk?

Still, this brings to mind this "I thou thee, thou traitor" (https://www.bartleby.com/185/41.html).

Translation-wise is pretty good compared to its current competence with Irish (Gaeilge). The Irish have a word for such Google-poop: "praistriúchán". Search on twitter for #praistriúchán and you'll see lots of scatological specimens.
Gregory Kusnick said,

September 29, 2018 @ 11:45 am

Still, if you need a latrine dug, you might as well take the English guy up on his offer.
wally said,

September 30, 2018 @ 1:00 am

I've been going on the assumption that one of the problems is a mismatch between the grammar of languages like Hungarian and Turkish and what I think are the implicit assumptions in the way I think Google Translate works. I dont know Hungarian as well as I do Turkish but I think the issue is the same. The problem is that words in Turkish are often formed with a series of suffixes. So you can have a string of even five or six or more simple words in English and have the equivalent Turkish be one word. The English words are all common easy words so Google Translate has no problem with them. But Translate may never have seen that root word with those particular suffixes before in Turkish so it doesnt match anything is has seen before so it doesn't know what to do.

I tested this idea once. We were running a Turkish document thru Translate and it was failing on certain words that surprised me. So I began peeling off the suffixes until I got down to a word that Translate could handle. These were very common words and suffixes. But the context was a bit unusual so I can believe that Translate was not familiar with the root and suffixes being used in this exact way.

I wont pretend to be able to judge if this might be a factor in the Hungarian sentences above.
Adrian said,

September 30, 2018 @ 11:55 am

Thanks to Fionnbharr. However…

– "Engedjétek meg nekem a tegezést." The diacritic makes all the difference.

Not really, since Google's take on that is "Let me do the painting."

– That aside, the speaker is already using the informal form, which is pretty rude. The second person plural form is being used instead of the third person plural, which doubles as the formal register. So, this should be: "Engedjék meg nekem a tegezést."

And Google's take on that is "Allow me to plunder me."

Apparently "tegeződünk" means "tired", and the AI gives up completely when faced with "Tegeződhetünk".
Fionnbharr Ó Duinnín said,

September 30, 2018 @ 1:57 pm

I just tried out a few verations on verb forms in Hungarian to see how Google translate copes and it isn't all bad. The screenshot is tweeted here (as there is not direct way of adding images in the contents): https://twitter.com/ujitas/status/1046471081798848512

Most surprising for me was that it vaguely copes with some of the differences in the definite/indefinite conjugations, but normally only as an alternative option, not the most likely one. It doesn't really understand the perfective nature of the prefix 'meg' with this verb.

However, the best of all, was to see that it understood the 'phrasal verb' nature of some of the prefixes e.g. 'fel-'.

The verb I seeded it with was "csinálni", to do or to make (in Hungarian this collocates with 'making a photograph'), with the variations being:
meg-: has a perfective meaning (but not always)
-ok: first person singular, indefinite conjugation (I’m ignoring vowel harmony variations, -ek…)
-om: first person singular, definite conjugation (I’m ignoring vowel harmony variations, -em…)
nem: expression of the negative ‘not’
kell: must (present tense)
lehet: is possible, can, could
fel-: up, though acts like a phrasal verb when added to verbs, with unpredictable (hilarious) results
-hat: can (I’m ignoring vowel harmony variations, -het…)
-e: indirect question form (i.e might it be that…)
-tat: causative, have something done, get something done
-jak: subjunctive first person singular form, with a word ending in ‘t’ this mutates with the final ‘t’ to -ssak. Can have a similar meaning to the modal ‘should’.

You have to see the screenshot in the twitter link above to see Google's output. My suggestions would be:
csinálni: to do
megcsinálni: to complete, to do something entirely
csinálok: I do (something), I am doing (something)
nem csinálok: I don’t do (something), I am not doing (something)
csináltam: I did (something), I was doing (something) / I did it, I was doing it
csinálom: I do it, I am doing it
nem csinálom: I do not do it / I am not doing it
megcsinálok: I complete (something), I am completing (something)
megcsinálom: I complete it, I am completing it
meg lehet csinálni: it can be completed
lehet megcsinálni: it can be done
meg kell csinálnom: it must be completed
kell megcsinálnom: it must be done
nem csinálom meg: I am not completing it, I do not complete it
megcsináltam: I completed (something), I was completing (something) / I completed it, I was completing it
nem csináltam meg: I didn’t do (something), I wasn’t doing (something) / I didn’t do it, I wasn’t doing it
felcsinálni: to fix to a higher place / to knock someone up, to put someone up the duff
felcsinálok: I fix something higher, I am fixing something higher / I knocked someone up, I am knocking someone up
felcsinálom: I fix it higher, I am fixing it higher / I knock her up, I am knocking her up
csinálhat: can do
csinálhatom: I can do it
megcsinálhatom: I can complete it
megcsinálhatom-e: can I complete it
csináltat: have something done
csináltathat: can have something done
megcsináltathat: can have something completed
megcsináltathatod-e: can you have someone complete it
megcsináltathassak: I should have someone complete it
megcsináltathassak-e: should I have someone complete it
ne csináltathassak meg: I should not have someone complete (something)
Fionnbharr Ó Duinnín said,

September 30, 2018 @ 2:35 pm

Apart from my glaring spelling mistake ('szarvashiba") at the beginning–my variation on "veriations"–the second and third-to-last entries were the indefinite forms, so should have read:
megcsináltathassak: I should have someone complete (something)
megcsináltathassak-e: should I have someone complete (something)
ktschwarz said,

September 30, 2018 @ 6:32 pm

I finally found the studies that address my question: Given the same training corpus for every language, can you measure which language pairs are easier to translate? It's been done using every possible pair from the 11 languages of the European Parliament corpus, and indeed, Spanish to French was the easiest, and Finnish to Dutch was the hardest. A main factor is that Finnish has much more complex word inflections than the others. This is also true of Hungarian and Turkish, so wally's assumption is on target. Thanks very much to Kirti Vashee at eMpTy Pages for writing a blog post that explained this to me at beginner level!

The obvious next question: Can our current algorithms work as well between languages that are both highly inflected — e.g., Finnish-Turkish — as they do between less inflected languages like Spanish-French? (Note, you can't test this with Google Translate or its competitors, because they always go through English as a hub.) I hope someone can educate me on this question.
DDOwen said,

October 1, 2018 @ 4:58 am

There was a brief period when Google translated 'grooming someone's horse' in English into 'fostering an inappropriate relationship with someone's horse' in Welsh, which I can only ascribe to the large number of bilingual legal documents it must have been using as input. (A few months back it had improved to translating it as 'marrying someone's horse' which is slightly better, but still wrong.)

Usually it's not that bad, though there's a subgenre of Cymrophone humour about bilingual signs that have been translated via Google without oversight by a fluent Welsh speaker…
derek said,

October 1, 2018 @ 9:23 am

The classic one being the road sign that told drivers "Sorry I'm out of the office, but I'll translate your sentence into Welsh when I get back"
Robert Coren said,

October 1, 2018 @ 10:39 am

@Adrian: "the speaker is already using the informal form, which is pretty rude."

For no good reason that I can now think of or remember, I took exception to a high-school friend's addressing me with the French informal forms, to which he responded teasingly, "Je ne te tutoyerai plus."
DWalker07 said,

October 1, 2018 @ 3:00 pm

"Actual meaning: Let me tegez you"

Huh? I don't understand that in English. What is a tegez?
ktschwarz said,

October 2, 2018 @ 1:14 am

Vance Koven, DDOwen, and derek bring up an important point: The quality of machine translation is not symmetric. It's often better at translating to English than from English. If it has trouble recognizing all those Hungarian inflections, it has even more trouble generating them. To quote from the paper on the EuroParl corpus experiment (Koehn, 2005):

Intuitively, translating from an information-rich into an information-poor language is easier than the other way around. Researchers have made similar observations about the better performance of Arabic–English SMT systems vs. Chinese–English SMT systems, that are trained on similar amount of training data and tested on news wire: Translating from Arabic with its rich morphology is easier than translating from Chinese, which is even more frugal than English, often lacking determiners and plural or tense markers.

Note that translating into English is among the easiest. However, since the research community is primarily occupied with translation into English, interesting problems associated with translating into
morphologically rich languages have largely been neglected.

Also, @wally, sounds like you independently invented what computational linguists call "stemming".
chris said,

October 3, 2018 @ 5:22 pm

It does seem like a bit of an unfair test to pick a sentence that *has* no English counterpart, because it's referring to features of the Hungarian language itself. Even a human has trouble expressing that meaning in a way that's immediately comprehensible to an English monoglot.

Not that an English speaker is incapable of thinking that thought at all, but at least, they aren't accustomed to it or their language would already have a commonly used way to discuss the subject.

P.S. It might be significantly easier to translate the example into, say, French, which *does* have a parallel concept. Assuming you're doing it directly, which as ktschwarz points out, several currently existing automated systems wouldn't.

RSS feed for comments on this post

Hungarian trenching

17 Comments

Adam F said,

DCBob said,

Vance Koven said,

Ambarish Sridharanarayanan said,

Fionnbharr Ó Duinnín said,

Gregory Kusnick said,

wally said,

Adrian said,

Fionnbharr Ó Duinnín said,

Fionnbharr Ó Duinnín said,

ktschwarz said,

DDOwen said,

derek said,

Robert Coren said,

DWalker07 said,

ktschwarz said,

chris said,

Follow us on Twitter

Archives [+/–]

Blogroll [+/–]

Meta