"Good morning" considered dangerous

« previous post | next post »

Yotam Berger, "Israel Arrests Palestinian Because Facebook Translated 'Good Morning' to 'Attack Them'", Haaretz 10/22/2017:

The Israel Police mistakenly arrested a Palestinian worker last week because they relied on automatic translation software to translate a post he wrote on his Facebook page. The Palestinian was arrested after writing “good morning,” which was misinterpreted; no Arabic-speaking police officer read the post before the man’s arrest. […]

The automatic translation service offered by Facebook uses its own proprietary algorithms. It translated “good morning” as “attack them” in Hebrew and “hurt them” in English.

The description of the mistake is puzzling:

Arabic speakers explained that English transliteration used by Facebook is not an actual word in Arabic but could look like the verb “to hurt” – even though any Arabic speaker could clearly see the transliteration did not match the translation.

Why is an "English transliteration" involved in the process at all? I suspect that this is a human misunderstanding of the computer misunderstanding, but perhaps someone involved with Facebook's MT team can explain.

 

 



18 Comments

  1. Simon said,

    October 24, 2017 @ 4:31 am

    MT usually uses English as a bridge language for language pairs where simply not enough training data exists to make it work. Meaning the system does not have enough Arabic-to-Hebrew data to come up with good translations and therefore goes Arabic to English, English to Hebrew. The downside of course being that context might get literally lost in translation or worse – as happened here, everything turns to complete gibberish or is mistranslated.

    [(myl) Right, but that's not consistent with what the article says:

    "Arabic speakers explained that English transliteration used by Facebook is not an actual word in Arabic but could look like the verb 'to hurt'".

    In no way would it make sense to say that an English bridge-language translation "is not an actual word in Arabic"…]

  2. Charles in Toronto said,

    October 24, 2017 @ 6:45 am

    Yeah that English sentence quoting Arabic speakers does not make sense at all. I suspect copy-editor mangling.

    Can someone shed light on the actual words/letters used and why a MT misinterpretation may have actually happened?

  3. Derry said,

    October 24, 2017 @ 7:33 am

    The whole story seems a bit garbled.

    If you transliterate the English "Good morning" into Arabic, could the result be read as a mispelling or typoed of hurt? I'm thinking of all the "did you mean…" messages I get when using translate.

  4. torbjörn said,

    October 24, 2017 @ 7:49 am

    At least google translate has definite trouble with non-english characters: translating swedish->english uncovers the following:

    kåpa -> housing [reasonable, but an odd first choice]

    kapa -> cut [correct]

    kapa kåpa -> hood hood ["hood" is actually a more reasonable translation of "kåpa", but clearly something else went badly wrong here]

    So whereas it knows the difference between the above words when they are entered individually it seems to forget it when faced with the (borderline nonsensical) third case. Why it would do this is a mystery to me, but perhaps something similar could happen to arabic words…

  5. Adrian said,

    October 24, 2017 @ 8:16 am

    I think the only way this could have happened is if the man mistypedصباح الخير. Hurt is جرح so it is a possibility.

  6. Joshua K. said,

    October 24, 2017 @ 8:19 am

    I would have expected that some members of the Israeli police would have sufficient knowledge of Arabic to recognize the words for "good morning." That seems like the kind of phrase that someone might learn in the first week of a language course.

  7. Alexander said,

    October 24, 2017 @ 8:31 am

    Does "English transliteration" just mean romanization?

  8. Shachar said,

    October 24, 2017 @ 9:41 am

    The text was يصبحهم (and not صباح الخير), which to the best of my knowledge, means something like "(may he) bless them with a good morning". One suggested explanation was that the automatic translation replaced one letter, actually translating يذبحهم, that means something like "(he) will slaughter them". The weakness of this explanation in my opinion is that the translation produced was a more general word, "hurt", which I assume wouldn't be the case if this was the (unfortunately popular) verb "to slaughter" was the one actually being translated.
    Another option is that the word used, or a variant of it appeared in contexts of "hurt" getting the MT confused.
    In any case, there's no question the police should have double-checked with an Arabic speaker before acting upon this translation.

  9. Bob said,

    October 24, 2017 @ 10:10 am

    According to a report in the Israeli Hebrew newspaper Haaretz, he wrote يصبحكم ‎/jsˤabbiħkum/‎ 'good morning', literally 'may he [God] give you a morning', and it was interpreted as يذبحكم ‎/jðabbiħkum /‎ 'may he slaughter you'. I suspect /jsˤabbiħkum/‎ 'good morning' is more common in a colloquial register than in formal Arabic, and so might not have appeared in the texts that the translation software had examined.

  10. Lameen said,

    October 24, 2017 @ 10:21 am

    The image attached to this post has "يصبحهم", literally "may He make them spend the morning", which seems to be short for the common colloquial expression "الله يصبحهم بالخير", "may God make them spend the morning well". It looks as though, instead of going for the meanings of صَبَّحَ that would be familiar to any Arabic speaker – "make spend the morning" or "say good morning" – Facebook opted for a sense of the word that I had never heard of before, and that certainly has no currency in colloquial use: "to attack by morning" (given as 5.4 in https://www.almaany.com/ar/dict/ar-ar/%D8%B5%D8%A8%D8%AD/ ). The existence of that sense in dictionaries makes the output slightly more understandable, but it's still very weird that the algorithm went for that sense. It's like translating the English sentence "Can you draw them?" as referring to mass execution by disembowelment – the sort of dictionary lookup mistake that any plausible probability weightings of different sense would make incredibly unlikely.

  11. Yuvlomov said,

    October 24, 2017 @ 11:09 am

    The possible explanations mentioned above (except the ar-ar dictionary lookup) were discussed, mostly in Hebrew, here:
    https://twitter.com/oren_data/status/922018312489459713

    Facebook was alerted in English here:
    https://twitter.com/yuvalmarton/status/922165313285660673
    But so far no response was posted in either thread.

    Other recent treads show the brittleness of current character-based MT engines (including Google and Bing), which all seem to hallucinate translations by replacing / omitting / adding one letter. Typical reasons for doing so are typos, rare inflections especially in dialect adaptation, and other out-of-vocabulary cases. This can easily explain “slaughter them” (one letter substitute) But “hurt” would require both letter order swap and letter substitution, which seems a bit far.

    It is not clear from the article that English transliteration was used as part of the translation. It could well be a separate service.

  12. Milan said,

    October 24, 2017 @ 2:02 pm

    @Joshua K
    I suspect that some kind of automated monitoring is used by the Israeli Police, which works – for whatever reason – with automated English translation rather than with Arabic original texts. Perhaps the relevant wordlists are only available in English. Of course, they at least ought to have checked that translation with a human translator before making an arrest.

  13. Bob said,

    October 24, 2017 @ 2:56 pm

    In my earlier comment I incorrectly copied the Arabic word with the object suffix meaning 'you' when what appears in the photo is actually 'them', as Lameen pointed out. With regard to what Shachar wrote, "The weakness of this explanation in my opinion is that the translation produced was a more general word, "hurt", which I assume wouldn't be the case if this was the (unfortunately popular) verb "to slaughter" was the one actually being translated": while the Hebrew verb לפגוע /lifgóa/ that the image shows means 'to strike, to harm, to touch ; to insult', even 'to hurt (emotionally)‎', a historically but not transparently related noun פיגוע /pigúa/ (note the identical 3 letters in the two words) routinely refers to 'terrorist attack'. Could the Arabic word for slaughter (which of course the poor guy did not write) have been matched by the translation software with a Hebrew word for 'terrorist attack' and then converted to a verb the Arabic word was a verb?

    Ironically, under the translation in the image is a link inviting the reader to "rate this translation"!

  14. Adrian said,

    October 24, 2017 @ 3:50 pm

    Google translates يصبحهم as "Become them" so, at least according to Google users, it is not a regular Arabic phrase for "Good morning".

  15. Yuvlomov said,

    October 24, 2017 @ 3:53 pm

    @adrian:I believe it’s dialectal (presumably Levantine Arabic)

  16. mg said,

    October 24, 2017 @ 11:26 pm

    FB translation is particularly bad, getting even simple things wrong when both Google and Bing do just fine with them. And there is no easy way to report the problem.

    For a particularly galling example, FB insisted on translating the standard Hebrew holiday greeting חג שמח as "Merry Christmas", not only during Hanukkah (bad enough) but at least as long as Passover. There was no effective way to complain about it (though we all kvetched to each other). Eventually FB seemed to get the message but be scared of the anti-PC crowd – it now refuses to translate חג שמח at all!

  17. Shachar said,

    October 25, 2017 @ 2:32 am

    Here, it was suggested that the word confusing the translation is يصبهم, that, as far as I understand, correctly translates to 'hurt' in some contexts (e.g. in medical contexts in the sense of 'it hurts'). To me, that's a much better explanation of the mistake than يذبحهم.
    https://twitter.com/BrianRoark13/status/922932182980554752

  18. Nicholas Ostler said,

    October 25, 2017 @ 4:17 am

    The lessons in Arabic are fascinating… but in practice, this looks like one painful reason to avoid unnecessary greetings in Facebook!

    Careless social media have costs, at least when they get into the wrong hands!

RSS feed for comments on this post