Language Log

Two Dots Too Many

April 22, 2008 @ 12:41 am · Filed by Bill Poser under Writing systems

« previous post | next post »

The Turkish newspaper Hürriyet reports a tragic consequence of the failure to localize cell phones.

Ramazan Çalçoban sent his estranged wife Emine the text message:

Zaten sen sıkışınca konuyu değiştiriyorsun.

"Anyhow, whenever you can't answer an argument, you change the subject."

Unfortunately, what she thought he wrote was:

Zaten sen sikişınce konuyu değiştiriyorsun.

"Anyhow, whenever they are fucking you, you change the subject."

She showed the message to her father, who angrily called Ramazan and accused him of calling his daughter a prostitute. Ramazan went to his wife's home to apologize, only to be attacked by his wife, her father, and two sisters. He was stabbed in the chest but succeeded in grabbing a knife, stabbing his wife, and getting away. Emine died of her wounds; Ramazan killed himself in jail.

How exactly did this tragedy come about? Turkish has four high vowels, front unrounded /i/, written , front rounded /y/, written <ü>, back unrounded /ɨ/, written <ı>, and back rounded /u/, written . The verb form that Ramazan wrote was sıkışınca, which is a gerund of sıkışmak, literally "to get wedged, to get in a tight spot", but here with the sense of "to be unable to answer an argument". What his wife thought he wrote was sikişince, the corresponding form of sikişmek "to fuck". The verb stems sıkış "to get wedged" and sikiş "to fuck" differ only in the backness of their vowels, which is reflected graphically in the presence or absence of a dot. The problem was that Emine's cell phone was not localized properly for Turkish and did not have the letter <ı>; when it displayed Ramazan's message, it replaced the <ı>s with s.

To see exactly how the misinterpretation arose, it may help to understand the structure of the Turkish sentences involved. zaten means "anyhow" and sen "you". konuyu is the definite accusative of konu "subject", which tells us that it is the object of the following verb. değiştiriyorsun is the second person singular subject form of değiştirmek "to change" in the present tense, meaning "you change". As you can see, the verb comes at the end of its clause, following the object. Only one verb, the second one, is marked for its subject: the suffix sun tells us that the subject of "change" is "you". The verb of the first clause is unmarked for subject.

If you have compared carefully what Ramazan sent and what Emine received, you may have noticed that Emine and her father should have had a clue as to what Ramazan meant in spite of the replacement of <ı> by . When you replace the <ı>s of sıkışınca with s as the cellphone did, the result is not sikişince "on fucking" but sikişinca. This is incorrect because it fails to conform to the rules of vowel harmony. In Turkish, most suffixal high vowels must agree with the preceding vowel in frontness and rounding, and most suffixal non-high vowels must agree with the preceding vowel in frontness. You can see this in the two forms of the infinitive suffix: mek after front vowels and mak after back vowels.

The gerund suffix used here has four forms: ince, used after front unrounded vowels, as in gelince "on coming", ünce, used after front rounded vowels, as in görünce "on seeing", ınca, used after back unrounded vowels, as in alınca "on taking", and unca, used after back rounded vowels, as in yorunca "on tiring". The final <a> of the word sikişinca that Emine received should have served as a clue that the preceding vowels were intended to be back, not front.

There are several lessons to take away from this tragedy. One is that localization is a good thing. Another is that it is best not to kill people who make you angry until you have carefully investigated the situation, if then. But as a phonologist and student of harmony systems, I have to see this as a compelling argument for paying attention to vowel harmony.

Hat tip to Mike Speriosu.

April 22, 2008 @ 12:41 am · Filed by Bill Poser under Writing systems

Permalink

7 Comments

John Cowan said,

April 22, 2008 @ 2:10 am

A more modest application of vowel harmony is going to my favorite Turkish restaurant, reading the menu (which is ASCII-only) and figuring out which tokens of i, o, and u are properly ı, ö, and ü respectively. Unfortunately, I know no way to reconstruct ç, ğ, or ş correctly without actually knowing Turkish, so I can't order in Turkish without sounding like a Greek.
Aidan Kehoe said,

April 22, 2008 @ 7:18 am

It's entirely possible to send 〈ı〉 by means of text message, but it halves the number of characters per message, cf. http://en.wikipedia.org/wiki/Short_message_service#GSM . The technical meat of it is that the 7-bit GSM alphabet was designed for Western European, and anyone whose needs aren't served by it must use UTF-16. There's been a media ruckus in Spain recently about this—at least ten years after the technology was deployed(!)—see http://www.lavanguardia.es/lv24h/20080415/53455491730.html . Though from my experience the missing letters (áíóú) are hardly used by the average Spaniard in email or chat anyway.
Sniffnoy said,

April 22, 2008 @ 2:31 pm

Halves the number of characters per message? Haven't they ever heard of UTF-8? I mean, it's not going to be compatible with the GSM alphabet, but neither is UTF-16 or anything else, so that seems a bit silly.
Aidan Kehoe said,

April 22, 2008 @ 3:40 pm

Sniffnoy, one of the assumptions implicit in many of the advantages of UTF-8 is that each letter is encoded using one octet. That is not the case with the GSM encoding. Normally, after the first seven bits have been received, you start the second character (though there are a few variable-length characters).

Also, the final draft of the relevant GSM standard dates from 1995, at which point very few people outside of Bell Labs were implementing UTF-8, despite the 1993 Usenix presentation. The design of the standard was not particularly astonishing in that context.
Sniffnoy said,

April 22, 2008 @ 7:23 pm

Oh, huh, when I saw "7-bit" I figured that just meant in the same way ASCII is 7-bit.
outeast said,

April 23, 2008 @ 4:28 am

That's very interesting to me, Aidan: it helps explain an issue that has long puzzled me, which is that when I type SMS messages in Czech (using predictive text with diacritics) the character allowance is always very short. So thank you!
Feriha said,

April 23, 2008 @ 3:35 pm

Hello,
Well, actually I have nothing to do with languages .I am from Turkey, and was searching in the Internet about our newest tragic and weird incident
Then I clicked here because I thought this is a new foreign entry about fanatically, pathologically sexually obsessed people who can kill each other at every inconceivable and ridiculous incident.
Then I read this serious article about vowels etc.
:)))
I really enjoyed it. I wasn't expecting it, though.
Thanks .

RSS feed for comments on this post

Two Dots Too Many

7 Comments

John Cowan said,

Aidan Kehoe said,

Sniffnoy said,

Aidan Kehoe said,

Sniffnoy said,

outeast said,

Feriha said,

Follow us on Twitter

Archives [+/–]

Blogroll [+/–]

Meta