« previous post | next post »

According to Andrew Higgins ("Kazakhstan Cheers New Alphabet, Except for All Those Apostrophes", NYT 1/15/2018), the pending turn to a Latin alphabet for Kazakh has run into a pothole: the 77-year-old dictator Nursultan A. Nazarbayev, who apparently has not yet been informed about Unicode, or the possibility of varied computer keyboard layouts.

Mr. Higgins also seems to be in the dark about such arcana — he refers to characters (or maybe diacritics) as "markers", for some reason, and apparently thinks that the Latin alphabet is nothing but good old US ASCII, with none of those furrin umlauts and accents and cedillas and such:

Because Kazakh features many sounds that are not easily rendered into either the Cyrillic or Latin alphabets without additional markers, a decision needed to be made whether to follow Turkish, which uses the Latin script but includes cedillas, tildes, breves, dots and other markers to clarify pronunciation, or invent alternative phonetic pointers.

In August, the linguists proposed using an alphabet that largely followed the Turkish model.

The president’s office, however, declared this a nonstarter because Turkish-style markers do not feature on a standard keyboard.

The scholars on the language commission, led by Erden Kazhybek, the head of the Institute of Linguistics in Almaty, then suggested using digraphs, or several letters to indicate a single sound, like “ch” in English.

This approach initially got a warmer reception from the president’s entourage but was then banished when Mr. Nazarbayev suddenly issued a decree on Oct. 27 ordering that apostrophes be used instead of Turkish-style markers.

The modified Latin alphabet put forward by Mr. Nazarbayev uses apostrophes to elongate or modify the sounds of certain letters.

The article gives this explanation for the president's choice:

The only reason publicly cited by Mr. Nazarbayev to explain why he did not want Turkish-style phonetic markers is that “there should not be any hooks or superfluous dots that cannot be put straight into a computer,” he said in September.

Because dog forbid that folks might have to use something like a Turkish keyboard layout:

Or maybe this is the reason:

But others saw another possible motivation: Mr. Nazarbayev may be eager to avoid any suggestion that Kazakhstan is turning its back on Russia and embracing pan-Turkic unity, a bugbear for Russian officials in both czarist and Soviet times.

Or maybe it's this:

Also likely playing a role in the president’s active involvement has been his advancing age and the question of how he will be remembered when he eventually steps down or dies.

“The president is thinking about his legacy and wants to go down in history as the man who created a new alphabet,” said Mr. Satpayev, who supports the switch to Latin script but not the president’s version. “The problem is that our president is not a philologist.”

Interestingly, the Oct. 27 decree is in Russian. The proposed new alphabet is given by this table:

The NYT article cites "a storm of mockery and protest on social media", including this music video:

All in all, a fitting continuation of the shenanigans described by Thomas Pynchon in the section of Gravity's Rainbow dealing with the planning for the New Turkic Alphabet in the Soviet Union of the 1920s ("How alphabetic is the nature of molecules", 9/27/2004). For the politico-linguistic history behind the fiction, see "Birlashdirilmish Yangi Turk Alifbesi", 9/27/2004.

[h/t Francois Lang]



  1. Chris C. said,

    January 16, 2018 @ 5:06 pm

    Surely, if Nazarbayev wanted to avoid looking too Turkish for the benefit of Russia's tender sensibilities, he could have adopted some of the diacritics used in western Slavic countries that write in latinica.

  2. Tim Leonard said,

    January 16, 2018 @ 6:17 pm

    By "markers," I think Mr. Higgins means diacritical marks.

    [(myl) Perhaps so — but there's a term for those as well, namely "diacritics". And FWIW all of the letters with diacritics used in Turkish have single-character Unicode code points, as far as I know.]

  3. Julien Baley said,

    January 16, 2018 @ 6:34 pm

    Isn't the idea behind going Latin precisely to turn their back on Russia? That would invalidate one of the hypotheses regarding the motivation behind the apostrophe.

    [(myl) I think the idea is to avoid being "pan-Turkic", which Russians have seen as politically threatening at least since Stalin's time. In "Birlashdirilmish Yangi Turk Alifbesi" I quoted Mark Dickens ("Soviet Language Policy in Central Asia") discussing versions of this anti-Pan-Turkic strategy going back to the New Turkish Alphabet of the 1920s:

    Further modifications to the Latin script served to create artificial differences between related Turkic languages as the same phoneme was represented by different letters in different languages […]. There is no good linguistic reason for having done this.


  4. Sergey said,

    January 16, 2018 @ 8:09 pm

    Apostrophes definitely are a much better solution. Being able to write in plain 26 Latin characters is a huge benefit. Cyrillic creates a lot of trouble with the transliterations to Latin, so it's great that they've thought of not creating the same kind of trouble the second time over. The ways to write in plain Latin would have to be invented anyway: think of the airline tickets for example, or of any foreign correspondents who won't have all these funny marks on their keyboards. So coming up with a definitive way to do that and making it the default and only way makes a lot of sense. And even in handwriting the apostrophes are easier to write than those weird marks.

  5. David L said,

    January 16, 2018 @ 8:41 pm

    In defense of the NYT and/or Mr Higgins, I think the average newspaper reader is unlikely to know exactly what 'diacritic' means, even if they have heard of the word. They may well know of things like acute and grave accents and tildes and umlauts and the like, and by giving such examples and using the word 'markers,' the story makes it pretty clear that we are talking about any and all of the funny little wiggles and squiggles that foreign people like to adorn their letters with.

  6. Jonathan Smith said,

    January 16, 2018 @ 9:16 pm

    Agree that apostrophe as diacritic seems fine as long as your orthography isn't doing a whole lot else with that punctuation mark… comes down to additional keystrokes vs. real or perceived advantages.

    [(myl) Would you also recommend it for French, German, Czech, etc.?]

  7. Jay said,

    January 16, 2018 @ 11:34 pm

    The new Roman alphabet makes Kazakh look like a conlang for a third-rate fantasy novel.

  8. Ran Ari-Gur said,

    January 17, 2018 @ 12:09 am

    I agree with Sergey and Jonathan Smith. As a counterpoint to Dr. Liberman's examples of languages that happily use diacritics, I'd give the example of Hebrew, which uses an apostrophe (or technically a 'geresh') in much this way, albeit only for non-native phonemes found in foreign names and loanwords; for example, the Zionist leader Ze'ev Jabotinsky is known in Hebrew as זאב ז׳בוטינסקי, with the <ז׳> representing /ʒ/.

  9. Ran Ari-Gur said,

    January 17, 2018 @ 2:02 am

    (Well, I don't completely agree with Sergey. Diacritics work well enough in languages that use them, as do digraphs like <ch> or <cz>. But apostrophes do have some advantages; I think it's all in what you're used to. If the proposal works out, Kazakhs in thirty years will probably find it strange that anyone ever laughed at the apostrophes.)

  10. Lazar said,

    January 17, 2018 @ 2:43 am

    Eh, I disagree. Taking examples from the linked video, at least, I think spellings like "Ay'yl s'ary'as'ylyg'y" and "I'ndi'ra ko's'ege s'yqty" seem more visually cluttered and harder to parse at first glance than "Auyl śaruaśylyğy" and "İndira kóśege śyqty". There's a wealth of technical precedent for official languages dealing with diacritics (most relevantly in Turkey), and the advantages of using the plain Latin alphabet are diminished by unworkability of apostrophized words in e.g. hashtags or URLs. Plenty of orthographies include apostrophes, but I've never seen one where they'd be as frequently used or as essential as in Nazarbayev's proposal.

  11. cliff arroyo said,

    January 17, 2018 @ 4:02 am

    The switch from Cyrillic to Latin for Kazakh seems (like increasing the use of Chinese characters in SKorea) to be an issue that pops up every few years and then disappears without much happening. Is there any real indication that they're more serious this time than the last few? What are the supposed benefits of a change supposed to be? Both LAtin and Cyrillic seem to work better for Turkic languages than the Arabic alphabet because they show vowels more clearly but I don't see any big difference between latin and cyrillic in terms of representing turkic languages.

    A few months ago I saw presentation of the supposed new alphabet that used sh, ch (and maybe zh) and ae, oe and ue. More letters and for doubtful utility but not as butt ugly as the presidents supposed solution. Which looks like a nightmare given how unicode and search engines (don't) process apostrophes.

    Kazakh has/had a perfectly acceptable Latin alphabet for many years similar to, but distinct from the modern Turkish alphabet you used to be able to use it at the Kazakh press agency choosing between it and cyrillic, it seemed to be a transcription of the cyrillic but without the hard sign which is not needed in Kazakh (presumably only used in loans from Russian)

    As for pan-Turkism, some years ago I had some contact with students from Turkey, among other interesting things they often repeated that Turkish is one of the most spoken languages in the world… later I found out that the more or less official policy in Turkey is that there is a single Turkish language and all the supposed separate languages are in fact merely dialects of greater Turkish….

  12. cliff arroyo said,

    January 17, 2018 @ 4:07 am

    "Cyrillic creates a lot of trouble with the transliterations to Latin"

    Only because Russian speakers want there to be a lot of trouble and because when it comes to the latin alphabet they are very English-centric. A bunch of Slavic languages use the latin alphabet and get by just fine, you need some diacritics or some diagraphs to transliterate Russian into LAtin and to think in somewhat non-English terms but it's very doable.

  13. Not a naive speaker said,

    January 17, 2018 @ 6:44 am

    Digraphs suck.

    Just compare Czech and Polish ortography.

  14. J.W. Brewer said,

    January 17, 2018 @ 8:31 am

    Mildly surprised no one has yet mentioned what I would think is probably the most historically-prominent apostrophe-heavy romanization system, viz. Wade-Giles for Mandarin. There's a fair amount of relevant history there re the tendency of users to intermittently omit the apostrophes in practice, resulting in what I believe Prof. Mair or some of his hanyuphile cohorts have called "Bastardized Wade-Giles," which features the very ambiguities that the apostrophes were intended to disambiguate. Unless one has grounds to think that Kazakh users (and foreigners who need to stick Kazakh words, including but not limited to proper names, into their own texts) will be more disciplined and scrupulous than the users of Wade-Giles historically tended to be outside the rarified precincts of copy-edited scholarly publishing, that seems a rather relevant and important historical precedent.

  15. Ellen K. said,

    January 17, 2018 @ 9:47 am

    Worth noting that on today's computers it's not too hard to switch between two different physical keyboards.

  16. Coby Lubliner said,

    January 17, 2018 @ 9:51 am

    I wonder why Kazakhs won't use either Turkmen or Uzbek as models. Both have Latin alphabets that, though unlike each other (the former uses mainly diacritics, the latter digraphs), both avoid the unfortunate Turkish duality of dotted and dotless I (which could have been avoided by using I instead of İ, Y instead of I, J instead of Y, and some kind of diacriticized Z instead of J for the /ʒ/ sound which occurs only in French loanwords).

  17. Victor Mair said,

    January 17, 2018 @ 9:55 am

    From Mehmet Olmez:

    If Kazakhstan accepts a new alphabet (based on Latin characters), I hope they can use Umlaut for different 'e' letters / phonemes, and I hope they use Umlaut for Ü and Ö too. Ş/Š and Ç/Č are another topic. I always had difficulties / problems during my Turcological study (and I still have same difficulties) about Tatar, Uzbek (Özbek ?) and Kazakh alphabets. On the other hand, Kirgiz and Azerbaijani alphabets (specially Azerbaijani) were excellent alphabets. Yakut, Khakas, and Tuva alphabets are also not bad. But, as you know, for Я, Е, Э, Ё, Ю, in most Turkic languages are used different way from others. In Azerbaijani, there was ja, je, jу and jo instead of Я, Е, Ё, Ю.

    Why are there two different letters in Turko-Cyrillic alphabets for same or similar /or closer/ phonemes? Ӧ and Ө, Ҡ and Қ, Ң and Ҥ, Ҹ and Җ are so different phonemes?

    In Cyrillic Turkmen өй is 'house' but dative form is өе 'to the home / to the house', and where is the word base for a foreigner to search it in the dictionary?

    What does it mean to have so many 'e' in Kazakh alphabet? Are there three different e phonemes? There are Ә, Е and Э. Of course, I know there are reasons and differences. But, I hope this time there will be just two letters for the e. I hope they don't decide to be so 'urgent' about ö, ü, š and č letters.

    As a foreigner, I can not judge the new Uzbek alphabet, but as a Turkish speaker and as a Turcologist, I don't like it!

    Please try to write in Tatar or Kazakh with Cyrillic sïgïr ! And how can you re-write it with the new Kazakh alphabet?

    Because they depend on the Cyrillic alphabet, some words' pronunciation is changed in Kazakh too (that is just a guess!).

    I chose some words from the Radloff' dictionary (with 'real' Kazakh form) and also some words from new Kazakh dictionaries, than asked a native Kazakh speaker to pronounce those words. The results were different.

    I guess, our masters, P. Golden, A. Rona-Tas, P. Zieme, M. Erdal know more details about this topic and everyone knows about what happened after 1940-1945 to Turkic alphabets in the former Soviet Union.

    I hope Erden (our Kazakh colleague) has an influence on the government's decision from a positive side about the alphabet topic.

    For about the last 20 years, Marcel has been making a considerable number of worthy observations about Kazakh.

  18. Victor Mair said,

    January 17, 2018 @ 10:07 am

    From Peter Golden:

    I read the article with great interest this morning. Of course, it has been discussed by Kazakh scholars on Facebook and elsewhere. The adoption of an alphabet (or an “official” literary language) is invariably a political act. One has only to follow the changing alphabets of the Turkic languages in the Soviet era. Whether the different graphs for essentially the same phoneme were the result of political calculation (fear of Pan-Turkism in its various forms) or premised on a desire to render most accurately the sounds in question remains a debated topic. I am inclined to see politics here. I am often reminded of the famous dictum of Max Weinreich, the founder of modern Yiddish studies, who, when asked “what is the difference between a dialect and a language?” said “a language is a dialect with an army and a navy.” [VHM: This is a topic that we have discussed many times on Language Log; see especially here.]

    In principle, one would like to see an alphabet that is easy to learn, one that reflects the modern pronunciation (unlike English, French, the various [still extant] Celtic languages et al. that produce a catalogue of orthographic “complications") and one that will open doors to the wider world. The apostrophe-laden alphabet that Nazarbayev appears to favor looks like a nightmare. If the desire is to switch to a Latin-based alphabet, as appears to be the case, then the Modern Turkish alphabet (which has an interesting history – see, for example, Geoffrey Lewis’s The Turkish Language Reform. A Catastrophic Success, Oxford, 1999, esp. chap. 3-4), as I see it, is the best to follow. One using hačeks (č, š) is also possible (we are told that Atatürk did not like it, hence the preference for ç and ş, the latter from Romanian [which had switched from Cyrillic in 1860-62]), but given the success of the Modern Turkish alphabet, it can be easily adapted to Kazakh (Qazaq) with only a few additions or changes.

    Full disclosure: having done part of my graduate work at the Dil ve Tarih-Coğrafya Fakültesi some 50-odd years ago, I am undoubtedly influenced by my mentors there (Hasan Eren, Saadet Çağatay, Zeynep Korkmaz) as well as by my main mentor Tibor Halasi-Kun, who had longstanding ties to the DTCF. In my own work, I use hačeks in rendering various Turkic languages, but that is because I use those same symbols in transcribing other languages written in Cyrillic, Georgian etc.). It is a matter of convenience and a desire not to burden the reader with too many transcription systems.

    In short, I agree with Mehmet. The apostrophes will make the new Kazakh alphabet as different from other Turkic alphabets as its Cyrillic precursor – perhaps even more so. Moreover, it is far from aesthetically pleasing (although beauty is in the eye of the beholder).

  19. Victor Mair said,

    January 17, 2018 @ 10:11 am

    From Mark Swofford:

    A long while back Tom Bishop (Wenlin) was kind enough to run a search through the ABC Comprehensive Chinese-English Dictionary for me. The search revealed that apostrophes are needed in only about 2 percent of Mandarin words as written in Hanyu Pinyin. I'm very much a stickler for insisting that they not be omitted.


  20. Victor Mair said,

    January 17, 2018 @ 10:18 am

    From Peter Golden:

    I was reminded that a “Common Turkic Alphabet” (Ortak Türkçe Alfabesi- https://en.wikipedia.org/wiki/Common_Turkic_Alphabet, see also detailed variant of this entry in Turkish: https://tr.wikipedia.org/wiki/Ortak_T%C3%BCrk%C3%A7e_alfabesi) was proposed some years ago at a conference in Baku. It is already being used for Azerbaijani Turkic (which has made some changes), Gagauz, Crimean Tatar and Tatar. Uzbek agreed and then decided to use another version of a Latin script. Türkmen also uses it, with some of its own innovations. The Common Turkic Alphabet is probably the best Latin-based system that can be used.

  21. RP said,

    January 17, 2018 @ 11:25 am

    The new alphabet doesn't seem use the letters C, W or X. Perhaps they should replace C' with C, replace S' with X, and replace either U' or Y' with W. That could replace the number of apostrophes considerably.

  22. Bob Ladd said,

    January 17, 2018 @ 11:40 am

    @ J W Brewer: Good point about omitting apostrophes in Wade-Giles, but I was about to say the same thing about diacritics. People leave them off. This is true regardless of their function. Good examples are pinyin (where tone diacritics are routinely omitted), Romanian (where the five letters with diacritics – representing four distinct phonemes – are often replaced by their unadorned counterparts in word-processed material), and (so I'm told) Vietnamese, where some of the diacritics are for tone and some for vowel quality. Relatedly, in some languages it's not unusual to omit diacritics in material written in ALL CAPS (tolerated in French, normal in Greek), and here again, the function of the diacritics is different (vowel quality etc. in French, stress in Greek).

    So maybe digraphs are not such a bad idea after all?

  23. Ed H. said,

    January 17, 2018 @ 12:23 pm

    I notice that the new alphabet does not use the letter “x.” This opens up the possibility of using the Esperanto x-metodo, where the otherwise unused letter follows any letter which would normally have a diacritic.

    Hieraux mi sxangxis cxion = hieraŭ mi ŝanĝis ĉion

    …. ok maybe not

  24. J.W. Brewer said,

    January 17, 2018 @ 12:38 pm

    I agree with Bob Ladd that digraphs are preferable from the POV of avoiding ad-hoc/informal omissions that predictably lead to confusion or at least ambiguity. Or cool extra glyphs that are different enough from "regular-letter-plus-a-diacritic" that people are stuck with them. But although I would happily join a restore-the-real-English-alphabet-with-thorn-and-yogh-and-etc grassroots movement, I don't quite have the energy to start one and no one else seems to be getting that particular bandwagon going. I think some of the versions of Cyrillic have glyphs that "regular" Cyrillic doesn't — I have no idea what sort of real-world confusion or shortcuts that leads to when the particular person typing and/or his hardware/software default to the wrong version of Cyrillic for the specific language at hand.

  25. Victor Mair said,

    January 17, 2018 @ 1:56 pm

    Nearly all of my German friends — at least those living in English-speaking countries or interacting with English-speaking people — regularly omit the umlaut and replace it with an "e", even from their own name (e.g., Schüssler –> Schuessler; and nobody I know writes "Schüßler" any longer). Something similar happens with French acquaintances living in English-speaking countries or interacting with English-speaking people (e.g., Françoise –> Francoise).

  26. Robert Davis said,

    January 17, 2018 @ 2:26 pm

    Agreeing with Mr. Mair's comment, my Spanish students would complain that their Mexican exchange brothers and sisters did not use accents when they wrote them (¿Qué? Versus …que… So can we not use them too?

  27. Bob Ladd said,

    January 17, 2018 @ 2:28 pm

    @Victor Mair: Yes, but the German usage is actually an interesting exception to generalisation that people leave out diacritics. If Germans leave out the umlaut (in the old days, because of ASCII coding limitations), they always put in an E. Your friend Schüßler may well write Schuessler, but never Schussler. The French case you cite (Francoise for Françoise), on the other hand, is a classic omit-the-diacritic case.

  28. J.W. Brewer said,

    January 17, 2018 @ 2:41 pm

    The German case is somewhat reminiscent of the situation in Late Middle (or maybe Super-Early Modern), English where the weird idiosyncratic local letters could be substituted for with back-up digraphs using letters from the regular "ASCII" set known to literate-in-Latin scribes throughout Western Europe. For example, in a context where a ȝ would be confusing or (as movable type arrived on the scene) unavailable for technical reasons, just use "gh" instead. And eventually the old idiosyncrasies fell into complete disuse and the digraph workarounds became the standard.

  29. J.W. Brewer said,

    January 17, 2018 @ 2:51 pm

    Note also that the umlaut-indicating -e has not been preserved 100% of the time in AmEng for proper names. Both "Mueller" and "Muller" are common American spellings of the surname that was ancestrally Müller (often of course just anglicized as "Miller"). Muller is the minority variant but a respectably-sized one as a matter of ratio to the majority variant, with about 27,000 instances in the most recent census data compared to about 64,000 for Mueller.

  30. Breffni said,

    January 17, 2018 @ 3:00 pm

    Bob Ladd's generalisation about German is right in my experience, but I encountered a counterexample just yesterday: a German in Ireland who spelled her name Muller on the grounds that Irish computers and administrators wouldn't be able to handle the ü. But note she didn't opt for Mueller. She probably reckons that when, for instance, she opens a bank account, Muller will be accepted as close enough to the Müller on her passport, whereas Mueller is likely to be treated as different from Müller by IT systems and their users.

  31. Lazar said,

    January 17, 2018 @ 3:45 pm

    @J.W. Brewer: Yep, German immigration is the reason why "Miller" is notably more common in the US than in Britain.

  32. Lazar said,

    January 17, 2018 @ 3:49 pm

    @Bob Ladd: "cz" might make for a fun, German-style substitute for French "ç" if you were so inclined.

  33. Michael Vnuk said,

    January 17, 2018 @ 5:25 pm

    The Turkish keyboard layout shown has a letter I in the second row (4th from left) and also in the third row (2nd from left). That seemed odd, and a quick search brings up images of keyboards showing the lower one dotted, ie with a single dot above the letter.

  34. David Marjanović said,

    January 17, 2018 @ 5:42 pm

    "Cyrillic creates a lot of trouble with the transliterations to Latin"

    That's a question of political will. Serbia uses Cyrillic and Latin* as 1 : 1 transliterations of each other.

    * Really. If you stand on a street in Belgrade or Niš and can read only one of the two, you're illiterate. It seems to me that people sometimes write without noticing which alphabet they're using.

    I think some of the versions of Cyrillic have glyphs that "regular" Cyrillic doesn't —

    There is no "regular" Cyrillic. Hardly any two languages written in Cyrillic outside the Russian Federation use the same letter inventory.

    The Latin approach to representing new languages generally involves diacritics or digraphs. The Cyrillic approach generally involves creating new letters. After all, Cyrillic started as plain old Greek with a bunch of extra letters for sounds Greek lacked.

  35. Levantine said,

    January 17, 2018 @ 8:27 pm

    In my experience, the Turkish keyboard layout shown here has been largely eclipsed by a variant QWERTY layout: https://upload.wikimedia.org/wikipedia/commons/thumb/6/63/KB_Turkey.svg/2000px-KB_Turkey.svg.png

    Coby Lubliner, J is found also in Persian loanwords. Why do you consider the use of the dotted and undotted I to be unfortunate? They form as natural a pair as U/Ü and O/Ö.

  36. tangent said,

    January 18, 2018 @ 1:33 am

    If the apostrophes could be written without so much horizontal space on the left (as an apostrophe normally has) it wouldn't seem so hard to parse visually.

    Maybe the apostrophes will migrate over top of their letters and become diacritics. Then Unicode can add glyphs for those.

  37. cliff arroyo said,

    January 18, 2018 @ 2:11 am

    @Levantine "Why do you consider the use of the dotted and undotted I to be unfortunate?"

    It's my understanding that the pair gave earlier computing generations fits because of the standard Latin i/I lower-upper case correspondance requires making things case sensitive that usually aren't…

    I'm not sure if they're a problem at present.

  38. raempftl said,

    January 18, 2018 @ 3:34 am

    Germans regard the ae, oe and ue as correct alternative spellings of ä, ö and ü. AFIK, it's one way in which those umlauts used to spelt before the diacritcs came about. (In some names this spelling is still preserved: Goethe, Goebbels)

    There used to be no upper-case ß. So when words were put in upper-case, the correct alternative spelling of ß used to be ss.

    So writing Schuessler instead of Schüßler still feels correct. Whereas Schusler would feel utterly incorrect. (And would be pronounced incorrectly by any German reading it.)

  39. Andreas Johansson said,

    January 18, 2018 @ 3:44 am

    Dotted v. dotless i is different from O v. Ö because neither Turkish letter is straightforwardly identifiable with i as used in other Latin script orthographies.

    Re the Kazakh proposal, it seems a little bizarre to use C' but not plain C. Also what's up with K and Q corresponding to the same Cyrillic letter? Does the Cyrillic orthography ignore a phonemic distinction here? Conversely, H corresponds to two Cyrillic letters (I presume the one looking like a Latin lower case "h" is a Kazakh specialty); is the Cyrillic spelling historical here?

  40. poftim said,

    January 18, 2018 @ 6:21 am

    A few things:

    I wonder how Kazakh Scrabble would work under this proposal. Separate tiles for each of the apostrophized letters, or tiles just for the plain letters with the apostrophe given its own tile(s)?

    I think a lot of us have an aversion to apostrophes as diacritics because we're not used to them *belonging* to the letter that precedes them.

    People have mentioned diacritics in Romanian, where the situation is a mess! It's not uncommon to see a word like Timișoara (which is where I live) spelt with the (correct) comma diacritic, an (incorrect) cedilla, and just a plain S, all on the same sign. Sometimes you'll even see the letters with diacritics in a different font (usually Times New Roman) from the rest of the text. A lot of websites are written either partially or totally without diacritics. If anything, the situation is improving slightly because a lot of predictive text facilities automatically add the diacritics with maybe 90% accuracy.

    The Cyrillic letter corresponding to Latin Q has a longer tail than the one that corresponds to Latin K.

  41. Andreas Johansson said,

    January 18, 2018 @ 6:38 am

    poftim wrote:
    The Cyrillic letter corresponding to Latin Q has a longer tail than the one that corresponds to Latin K.

    Ah, thanks. In the font used, at least, the distinction looks pretty fine to someone not used to looking for it.

    In a partial precedent, the Uzbek orthography uses an apostrophe-like sign that Unicode is pleased to call MODIFIER LETTER TURNED COMMA after O and G, e.g. in oʻzbek "Uzbek". The same sign appears to be used more widely to modify letters in Karakalpak (spoken mostly in Uzbekistan and closely related to Kazakh).

  42. FJ said,

    January 18, 2018 @ 8:32 am

    David Marjanović: "It seems to me that people sometimes write without noticing which alphabet they're using"

    As a person from Serbia I can confirm that it's certainly true that we don't notice which one of the two we're reading unless we stop and think about it.

    It's also true what Bob Ladd said about diacritics. People in the region (Bosnia, Croatia, Serbia, Montenegro) are regularly too lazy to use them when it comes to social media/emails/SMS (though never when writing by hand). We call this 'ošišana latinica' (shorn Latin script). This of course sometimes leads to humor, e.g.

    Idem da se šišam. (I'm off to get my hair cut.)
    Idem da se sisam. (I'm off to suck myself.)

  43. Lazar said,

    January 18, 2018 @ 12:30 pm

    the correct alternative spelling of ß used to be ss.

    True, although I kinda prefer the alternative of using sz to keep it distinct from other ss.

  44. Jarek Weckwerth said,

    January 18, 2018 @ 1:01 pm

    My two cents: If you are developing a new alphabet in 2017, I think it's quite evident that digraphs should be preferred over diacritics. True, it means bowing to the inadvertent American English imperialism of ASCII, but — let's face it — things are the way they are. Unicode and keyboard layouts notwithstanding, ASCII still rules. The Kazakh president does have a very valid point; just that the apostrophes are a singularly bad solution.

    If there are unused letters, then Ed H.'s Esperanto solution of cx etc. is a brilliant idea.

    I write from the background of a language that has both some diacritics and digraphs. There is a national letter in my given name. There are plenty of digital situations where I don't use it to save myself trouble. What's the use of a diacritic if you're going to omit it? As you can see, my surname has some crazy spellings. These never get dropped even though they cause a lot of grief to my compatriots. Go figure.

    So, if you can enjoy the freedom of designing the alphabet today, don't do diacritics.

    (BTW the very first comment I wrote on LL — actually back in times when there were no comments on here, and Mark Liberman kindly posted a comment I sent in by email — was on this very topic. More than 10 years ago… Doesn't time fly.)

  45. Jarek Weckwerth said,

    January 18, 2018 @ 1:02 pm

    Oh it's 2018!

  46. J.W. Brewer said,

    January 18, 2018 @ 1:13 pm

    One area in which diacritics are not tolerated is the "machine-readable" part of passports which per international standards are limited to the basic ASCII A through Z (typically in ALLCAPS). By coincidence I just came across the official guidance (starting at page 30 of this link https://www.icao.int/publications/Documents/9303_p3_cons_en.pdf ) for "transliterating" Latin-alphabet letters with diacritics into Latin-alphabet letters without them. In the overwhelming majority of cases the recommended transliteration is "hey, just the same letter stripped of the diacritic," but there are a handful of exceptions where digraphs are suggested, including but not limited to the traditional mode referenced above of indicating an umlauted German vowel by sticking in a trailing -e.

    Apostrophes seem to go unmentioned, perhaps because of a failure to conceptualize them as a non-standard but extant Latin-alphabet "letter." I frankly don't know whether the present convention for someone surnamed e.g. O'Reilly is for the surname to come out on the machine-readable part of the passport as O'REILLY or OREILLY, but whoever's in charge of the future of Kazakhstani passport formatting might want to look into that.

  47. David Marjanović said,

    January 18, 2018 @ 2:51 pm

    Re the Kazakh proposal, it seems a little bizarre to use C' but not plain C.

    That's because C is reserved for Ц, which (pronounced [ts]) is limited to Russian words and reportedly not consistently pronounced as such even there.

    H corresponds to two Cyrillic letters (I presume the one looking like a Latin lower case "h" is a Kazakh specialty); is the Cyrillic spelling historical here?

    More or less. In Arabic and Persian loanwords, some kinds of Kazakh have both /h/ and /x/, neither of which apparently occur in native vocabulary (/x/ of course occurs in Russian loans). Most people, from what I've read, merge them in pronunciation.

  48. cliff arroyo said,

    January 19, 2018 @ 2:32 am

    Just refound a site with the current Kazakh latin alphabet, you can go back and forth between it and cyrillic (the latin is qaz in the language menu next to the logo in the upper left hand side).
    As I said, similar to but also very distinct from the Turkish alphabet and quite feasible as a national script if they want to go that way.


  49. David Marjanović said,

    January 19, 2018 @ 5:40 am

    With that many apostrophes, using single quotes would become impossible.

  50. cliff arroyo said,

    January 19, 2018 @ 12:44 pm

    I just checked again and the site I linked to apparently also lists Arabic alphabet Kazakh as an option (that's what it looks like maybe somebody who know more could verify.

  51. Andreas Johansson said,

    January 19, 2018 @ 12:53 pm

    @cliff Arroyo

    Acc'd WP, Kazakh is still written in Arabic script in China (which has a sizeable Kazakh minority in Xinjiang).

  52. André Schappo said,

    January 19, 2018 @ 1:04 pm

    I have read, on several sites, that the new latin script Kazakh will make it difficult or impossible to use twitter hashtags. Actually, one can have latin script Kazakh twitter hashtags. Please see my tweet at https://twitter.com/andreschappo/status/954034152621387776 which contains 3 latin script Kazakh hashtags, all of which contain apostrophes.

    For an explanation of how to do it please see https://schappo.blogspot.co.uk/2018/01/computer-science-internationalization_18.html

  53. Lazar said,

    January 19, 2018 @ 2:24 pm

    @André Schappo: But doesn't that rather defeat the stated goal of using no special characters in the new script?

  54. cliff arroyo said,

    January 19, 2018 @ 5:05 pm

    " the new latin script Kazakh will make it difficult or impossible to use twitter hashtags. Actually, one can have latin script Kazakh twitter hashtags"

    my problem with the new script (rather than the current perfectly fine latin alphabet or the okay earlier proposed version with ae, oe, ue, ch, sh, ng etc) is that it's butt ugly and the more I look at it the uglier and more ungainly it seems…

    If the president wants to go down in history for the alphabet it should at least look nice…

  55. Peter Taylor said,

    January 19, 2018 @ 5:39 pm

    Digraphs also have their issues. For example, they're a nuisance when it comes to sorting. When I started learning Spanish there were three digraphs in the alphabet (ch, ll, rr), so chabacanada came after cuzqueño in the dictionary. Specifying that for computer-based collation is more complicated than simply specifying an order for codepoints.

    Now those digraphs are considered to each be two letters rather than two glyphs forming one letter, and the transition phase inevitably has its own problems.

  56. Lazar said,

    January 19, 2018 @ 8:52 pm

    Digraphs are also tricky whenever they can be confused with a sequence of two monographs. For example, if we were creating a Spanish orthography from scratch we might represent /ɲ/ as ny – but then there's a "real" ny in cónyuge. Likewise Arabic, where sh in common transliteration can be confused with an actual sequence of s and h.

  57. Andrew Usher said,

    January 19, 2018 @ 8:55 pm

    J.W. Brewer:

    Your link does say (page 19, 27 of the PDF) that apostrophes are deleted; so, indeed it should be OREILLY. But it probably wouldn't cause any more problem to this proposed Kazakh alphabet than the deletion of diacritics does to some Latin alphabets.

  58. Jarek Weckwerth said,

    January 20, 2018 @ 6:15 am

    @ Peter Taylor: they're a nuisance when it comes to sorting

    That's a peculiar problem of Spanish. English doesn't do this, and neither does e.g. German, French, Dutch or Polish.

  59. Jarek Weckwerth said,

    January 20, 2018 @ 6:17 am

    @ Lazar Digraphs are also tricky whenever they can be confused with a sequence of two monographs.

    If you're a linguist designing a new alphabet, your job would be to avoid or minimize this kind of thing.

  60. Jarek Weckwerth said,

    January 20, 2018 @ 6:21 am

    @ Andrew Usher But it probably wouldn't cause any more problem to this proposed Kazakh alphabet than the deletion of diacritics does to some Latin alphabets.

    Well, native speakers of diacritic-using languages can get by perfectly well without them, most of the time. But if you design a system that does not rely on diacritics, then they won't ever have to face the conundrum in the first place.

  61. André Schappo said,

    January 20, 2018 @ 10:35 am

    @Lazar You ask: "But doesn't that rather defeat the stated goal of using no special characters in the new script?"

    First some terminology. The formal name of the apostrophe I use for the non breaking latin script Kazakh hashtags is MODIFIER LETTER APOSTROPHE (it is a convention to write these formal names in uppercase). Here is a heap of information about it ➜ https://codepoints.net/U+02BC

    I do not consider MODIFIER LETTER APOSTROPHE to be a special or non standard character.

    or to put it another way

    If MODIFIER LETTER APOSTROPHE is special/non standard then so is my name André which has that pesky acute accent over the e which most people do not bother to write.

    or to put it yet another way

    Which standard? I work to the Unicode standard which currently has 136000+ characters and encompasses emoji, symbols and most every human language script. MODIFIER LETTER APOSTROPHE is part of the Unicode standard. Most Computer Science departments in schools, colleges and universities only teach ASCII programming. I teach my students Unicode programming. Recently I gave an internationalization workshop to school students and amongst other things, I opened their eyes to Unicode. At their schools they are only taught ASCII programming. One student though had used his initiative and learned about Unicode outside of the classroom.

  62. poftim said,

    January 20, 2018 @ 10:42 am


    Welsh has this problem too. The letter 'ng' comes between 'g' and 'h' in the Welsh alphabet if I remember rightly, and I think sequences of n+g also exist. It's even worse in Hungarian where many letters consist of two glyphs, and combinations like zs+z and z+sz exist. Coming up with a sorting algorithm for Hungarian, which also has its fair share of diacritics (of which some are classified as separate letters and some aren't), must be a nightmare.

    It seems that if you're designing a new alphabet, you've got four main options which all have their pros and cons:

    1. Diacritics
    2. Digraphs
    3. Completely new letters
    4. Allowing some letters to do double/triple duty and relying on context to solve most of the ambiguities.

    In a world where ASCII didn't still rule and keyboard layouts weren't an issue, I think I'd go for 3.

  63. Lazar said,

    January 20, 2018 @ 4:09 pm

    @André Schappo: My goodness, what a condescending response. (I know what Unicode is, thanks.) I was referring to Nazarbayev's stated goal of not using non-ASCII characters in the new orthography: if MODIFIER LETTER APOSTROPHE became a necessary part of it, then the justification for using apostrophes in the first place would disappear.

  64. André Schappo said,

    January 21, 2018 @ 3:56 am

    @Lazar Sorry, it was not meant to be a condescending response. I was just giving information from a Computer Science point of view. Plus, I don't think I have ever been to this discussion forum before, so I do not know what is common knowledge on the forum.

RSS feed for comments on this post