"A high and dark man she had never seen before"

« previous post | next post »

Earlier this year, we had some fun with a quirk of web-trained statistical MT that sometimes causes odd mistranslations of country names. This happens because information in parallel web pages is often localized rather than translated; some of the posts are "Made in USA == Made in Austria|France|Italy", 3/23/2008; "Austria == Ireland?", 3/24/2008;  "Why Austria is Ireland", 3/24/2008; "The (probable) truth about Austria and Ireland", 3/24/2008.

Most if not all of the examples we discussed then have been fixed, but a new case has turned up in Google Translate's mapping from Norwegian to English. The source is an interesting story in a Norwegian newspaper (Siril Herseth, "Obama «reddet» Mary – betalte reisen til Norge", 10/4/2008), which describes how, twenty years ago, Barack Obama acted as a good Samaritan in helping a stranger who was short of money in the Miami airport.

The article's title, put through Google Translate's Nowegian to English system, comes out as Obama "rescued" Mary – paid trip to Ireland.

The sub-hed, "Mary var nygift og klar for å flytte til Norge, men ble stoppet på flyplassen fordi hun ikke hadde nok penger til turen. Da dukket Barack Obama opp og betalte for henne", comes out as "Mary was nygift and ready to move to Canada, but was stopped at the airport because she did not have enough money for the trip.. When Barack Obama came up and paid for her."

So "Norge", which is Norwegian for "Norway", is first translated as "Ireland" and then as "Canada".

The story, as far as I can understand it, is this. In November of 1988, an American, Mary Menth, had just married a Norwegian, Dag Andersen. She was traveling to Norway, alone,  to join him in their new life. When she got to the head of the line at Miami Airport, the clerk told her that she owed an extra $103 for overweight luggage.

Google Translate explains:

Mary had no money. Her new husband was gone in advance to Ireland, and Mary had no one else she could call.

That would be Norway that her new husband had gone ahead to, of course. (The story doesn't explain why she had no credit cards, which were fairly common even in 1988, but let's take this as given.) Continuing with the translation:

I was completely desperate and tried to think through which of the assets I could do without me. But bags were filled with the best I owned, "says Mary.

Although she explained the situation to the man behind the counter, he showed no signs of mercy.

– I started to cry, tears waterfall and I did not know my humble advice. Then I heard a mild and friendly voice behind me say, "I'll pay for her."

Mary turned up and there was a high and dark man she had never seen before.

Barack Obama gave her his address (actually his mother's address in Kansas) so that she could pay him back after she got to Norway.

This was in November of 1988, so according to Obama's Wikipedia bibliography, he had finished his stint as a community organizer in Chicago, and hadn't yet enrolled in law school. This was around the time of his trip to Europe and Africa, so perhaps that's why he was at the airport.

Anyhow, I'm impressed by the ability to understand the essentials of a story like this via Google Translate, even if you do have to be a bit careful about believing the references to countries, and some of the passages are less than idiomatic:

– He had a mild and friendly voice that was still determined. The first thing I thought was: "Who is this man?

Although the incident happened 20 years ago, still remembers Mary authority this man radiated.

– He was pretty and moteriktig in clothes with brown skinnsko, open cotton shirt and kakibukser, "says Mary.

[Hat tip: Jonathan Weinberg. More on the content, from a Norwegian blogger, is <a href="http://leishacamden.blogspot.com/2008/10/not-that-it-matters.html">here</a>.  No U.S. media outlets seem to have picked the story up yet.]



5 Comments

  1. Johan Richter said,

    October 6, 2008 @ 1:18 pm

    You're understanding of the story is correct. The translation is quite impressive even though there are individual words that are untranslated you would think the program knew. ( Eg "vinket" should be waved, which I would have guessed the program knew and 'reddende" means saving. And amusingly enough it translates "lapp" meaning "note" into "kronor" which is the norwegian currency in several places. Even in a compound word like "papirlappen" = "the paper note" it separately translates it into paper and kronor. )

    The story also contains another puzzling geographical substitution: "Åsgårdstrand" gets translated into "Witbank" which according to wikipedia is the former name of town in South Africa! "Strand" can in certain circumstances be translated as (river)bank, in Swedish and I guess in Norwegian as well, but that hardly explains it.

  2. Rubrick said,

    October 6, 2008 @ 3:23 pm

    I'm just waiting for the McCain campaign to begin circulating stories about Obama being high at the airport.

  3. Tim Silverman said,

    October 6, 2008 @ 3:42 pm

    Strand can be translated as "river bank" in English, too!

  4. Ivan said,

    October 7, 2008 @ 2:15 am

    Heh… occasionally, I still catch myself saying "high" instead of "tall" in English (in my native Croatian, visok can mean both things). The fact that the word for the property of being tall is "height" sure doesn't help. :-) Unsurprisingly, Google Translate makes the same high/tall mistake when translating from Croatian and Russian.

    The quality of the above translation is really good, but it doesn't surprise me that much, since as far as I know, Norwegian is a Germanic language whose grammar has historically evolved in a direction not too terribly different from English, so even simple word-for-word substitutions should make sense most of the time. I'm more impressed with the quality of Google's automatic translations from Russian, which are not as good, but still quite impressive considering its relatively free word order and the information carried by noun cases. (Its results with Croatian are poor, but I'd guess that it's due to a much smaller corpus used to train the software.)

  5. Sili said,

    October 13, 2008 @ 10:03 am

    It sure doesn't like compound nouns, though.

    To be fair, more and more Danes seem incapable of using them too, though. Makes searching for stuff in Danish very annoying since Google can't put in the spaces on its own, so one has to make one's own misspellings.

RSS feed for comments on this post