Language Log

Erdoğan

August 4, 2016 @ 6:52 am · Filed by Mark Liberman under Language and culture, Writing systems

Below is a guest post by Bob Ladd:

Recent events in Turkey have meant that President Erdoğan is in headlines around the world – except that in many parts of the world, the headlines are about President “Erdogan”. A few newspapers outside Turkey faithfully reproduce the yumuşak G (the letter G with a short mark or caron, which between vowels is mostly silent in Turkish), but mostly they just use an unadorned G. So is this a matter of technology or ethnocentricity? That is, do newspapers ignore the diacritic on the G because inserting the correct character would be a time-consuming and potentially error-prone process? Or do they ignore it because it’s a weird letter in a weird language and nobody really cares anyway? There’s a lot of evidence to suggest that both factors play a role.

I’m not denying that letters with diacritics can be a nuisance for word processing, and that lots of people are happy to ignore them. In Romanian, for example, the very regular standard orthography includes five letters with diacritics. When word processing abruptly arrived in Romania in the early 1990s after Ceauşescu was overthrown, and people unfamiliar with computers started using them to compose documents, they often omitted the diacritics simply because it was easier. (During the Ceauşescu era even typewriters were pretty scarce, and were supposed to be registered with the authorities so that the source of subversive pamphlets could be traced.) Some 25 years later, it’s still remarkably common to find notices and web sites in Romanian without any diacritics at all. So the technological excuse for all the headlines about “Erdogan” is almost certainly part of the story.

However, when I started noticing this, I realised that some of the newspapers who write about “Erdogan” nevertheless sometimes use diacritics in other contexts. For example, the Economist includes the accent on Hugo Chávez’s name every time it prints it (which is still fairly often), but never includes the caron on the G of Erdogan. And quite a few continental European newspapers reproduce the umlaut on the name of Fethullah Gülen, the man who is alleged (by the man the same newspapers call “Erdogan”) to have masterminded the recent coup attempt in Turkey. Here, too, we could have both technological and ethnocentric explanations. Characters like á and ü are part of the set of 256 8-bit ASCII codes that are widely supported in even fairly basic word-processing programmes; the Turkish yumuşak G is not. So perhaps newspapers like the Frankfurter Allgemeine or the Neue Zürcher Zeitung that write about “Gülen” and “Erdogan” are just trying to avoid printing errors. But again – maybe for a German-language newspaper, ü looks like a proper letter and ğ looks weird, and nobody really cares anyway.

It doesn’t take long to establish that ethnocentricity really is part of the explanation for all this. Close inspection of several recent issues of the Economist reveals that correct diacritics are consistently used with names in French, German, Spanish and Portuguese (and maybe Italian). Diacritics in all other languages that use any form of the Roman alphabet are consistently ignored. This is not just a question of letters outside the ASCII 8-bit canon, as in Erdoğan or Ceauşescu. In the Economist, the same diacritic gets treated differently depending on the language. So Wolfgang Schäuble gets his umlaut, but Fethullah Gülen doesn’t. Hugo Chávez gets his acute accent, but Viktor Orbán doesn’t. Even the Scandinavians are outside the charmed circle: in the Economist, Anders Breivik’s atrocity took place on Utoya, not Utøya, and (in the set of recent issues I checked), the Swedish names Björn and Malmström were reproduced without their umlaut. Ironically, the head of the German Green party, Cem Özdemir, counts as German for the Economist, so he gets his umlaut; if his cousins in Turkey made the news for some reason, they wouldn’t.

I don’t mean to pick on the Economist. A fairly quick inspection of web pages suggests that both the New York Times and the Financial Times operate essentially the same policy – diacritics for languages like French, German and Spanish, basic 7-bit ASCII (no diacritics at all) for the rest. It appears, from the same inspection of web pages, that a number of continental newspapers have a slightly more inclusive “8-bit” version of the same policy (8-bit ASCII for everyone, which guarantees correct diacritics for a few major languages but not for many of the others). However, they are also less consistent – for example, mentions of Viktor Orbán in the Frankfurter Allgemeine sometimes have an accent and sometimes don’t.

I would hesitate to ascribe any serious political significance to this, except that the Guardian seems to be a consistent user of correct diacritics for everyone. They have consistently been writing about Erdoğan, and when they report on someone like, say, Laura Codruţa Kövesi, the head of the Romanian anti-corruption agency, they print her name the way she spells it. The Süddeutsche Zeitung is almost in the same league with the Guardian, though they let me down with Codruţa Kövesi. For mainstream papers, the Guardian and the Süddeutsche are decidedly to the left of the spectrum, decidedly internationalist/Europeanist, and so on, and you would expect them to resist any suggestion that some languages are more important (or more normal) than others. This is reflected in their typographical policies. Most of the rest, whatever their editorial line, in practice make a subtle contribution to marginalising places where the major Western European languages are not at home.

August 4, 2016 @ 6:52 am · Filed by Mark Liberman under Language and culture, Writing systems

Permalink

69 Comments

Bathrobe said,

August 4, 2016 @ 6:59 am

I don't suppose Vietnamese figures would get much coverage in the press, but how does a name like Nguyễn fare?
Ari Corcoran said,

August 4, 2016 @ 7:26 am

@ Bob Ladd. You are probably right about Italian, though diacritics are relatively rare in people's names. A quick (and not very scientific look at the Internet) yielded a different result.
The Sicilian town of Cefalù, when entered into Google, renders it correctly in Italian publications (with the exception of capped versions, where this is rare in Italian newspapers anyway). Change the search to Cefalu, without the diacritic, and mentions of this beautiful town suddenly abound in English news outlets. Ethnocentric, perhaps unremarkably, but those damn diacritics change pronunciations and stress, and can lead to confusion!
Perhaps the lesson in googling is to enter both "with" and "without" diacritics to get a better sense of sources when researching.
Alon Lischinsky said,

August 4, 2016 @ 7:29 am

@Bathrobe: at the Grauniad, sometimes very well, sometimes not so much (although in the latter case it seems that Dong Nguyen was the developer's own spelling, in preference to the original Nguyễn Hà Đông).
Jongseong Park said,

August 4, 2016 @ 7:40 am

@Bathrobe: While this is not the same thing, the Korean loanword transcription rules for Vietnamese strip the tone diacritics from the Vietnamese spelling but keep the diacritics that are considered part of the basic letters in the Vietnamese alphabet, like ă, â, đ, ê, ô, ơ, and ư. This is just for internal use, since the end goal is to regulate the transcriptions of the Vietnamese words in the Korean alphabet for use in Korean texts. So tones need not be distinguished, but pairs like d and đ, o and ơ need to be distinguished since they are mapped to different letters of the Korean alphabet. In this partially simplified scheme, Nguyễn, Hồ Chí Minh, Ngô Đình Diệm, and phở become Nguyên, Hô Chi Minh, Ngô Đinh Diêm and phơ respectively.

The story of Romanian ș and ț is the cautionary tale par excellence when it comes to diacritics and technology. This is worth a read: "Romanian diacritic marks"
Deborah Pickett said,

August 4, 2016 @ 7:45 am

To support the claim that Scandinavian countries get no love, I present two of the characters from Icelandic, Eth (Ðð) and Thorn (Þþ). While part of the Latin-1 character set, they are presumably so hard to type on English keyboards that most people give up, and just write D or TH. It probably doesn't help that they are a bit alien: most English speakers wouldn't know how to pronounce them anyway.

Not even the Guardian is good at writing Icelandic properly. Guðni Thorlacius Jóhannesson, the new president of Iceland, is written in 7-bit ASCII (https://www.theguardian.com/world/2016/jun/26/gudni-johannesson-claims-victory-in-icelands-presidential-election). The volcano Eyjafjallajökull get is its diaeresis about half the time in Guardian articles. A recent article about football (soccer) mentions Þórunn Helga Jónsdóttir as "Helga Thórey Jónsdóttir", curiously reordering and misspelling her name while leaving other parts intact.
Ben said,

August 4, 2016 @ 7:46 am

Ironic that the author uses the incorrect diacritic in Ceaușescu's name. Romanian ș but Turkish ş.
Jayarava said,

August 4, 2016 @ 8:02 am

I often write about/in Pāḷi and Saṃskṛtaḥ and I consider diacritics essential. I also like to be able to switch to देवनागरी when I need to. Unicode has made all this very easy. Devanāgarī slows me down a little, but IAST does not. And I don't usually have to worry about whether someone else can read because most web-browsers cope with fonts well, indeed most modern apps also cope. Tibetan, which I use very occasionally, is another story – Facebook and Twitter both impose rigid fonts that fail to render Tibetan correctly.
Alex said,

August 4, 2016 @ 8:06 am

I asked the editor of the Economist about this exact question a couple of years ago. The answer I received is that it is basically a matter of cost. The Economist uses a proprietary font for its print edition and so would have to commission many new glyphs (for all the various weights and styles) for each extra character but it was decided that this was not a justifiable expense for languages other than French, Spanish, Italian, German, and Portuguese. In addition it would add editorial costs to make sure no errors are made.

I have noticed that the correct diacritics for Central European and Turkish names are increasing being used on The Economist website in content which is online only and which I imagine is uploaded into the CMS directly by the journalists themselves.
RP said,

August 4, 2016 @ 8:11 am

The actual Guardian policy is:
"Use on French, German, Portuguese, Spanish and Irish Gaelic words (but not anglicised French words such as cafe, apart from exposé, lamé, résumé, roué). People’s names, in whatever language, should also be given appropriate accents where known."
( https://www.theguardian.com/guardian-observer-style-guide-a )

Which is weird (for example, why use them on Irish Gaelic words but not on Scots Gaelic ones; Spanish but not Catalan words) but the bit about people's names seems to be where they differ from many other papers.
SP said,

August 4, 2016 @ 8:20 am

The Economist's policy (which is more or less as deduced in the post) is detailed in its style guide:
http://www.economist.com/style-guide/accents
Tim May said,

August 4, 2016 @ 8:31 am

The Economist is printed in a custom typeface which apparently only includes accented letters sufficient for French, German, Spanish and Portuguese. See the first comment here. They claim (well, one editor at one time is reported to have claimed) that it would be too expensive to add more. This doesn't explain why they don't accent names like Viktor Orbán, but perhaps they wish to ensure consistency within any given language: if you can print "Orbán" but not "Szűrös”, it might be less confusing to strip the accents from both.

Incidentally, the accent on "ğ" is a breve (rounded), not a caron (angled).
J.W. Brewer said,

August 4, 2016 @ 9:31 am

"Ethnocentric" seems a very peculiar word to describe an English-language publication which treats Portuguese orthography with more attention than Norwegian orthography. Norwegian-speakers are on average closer to English-English people than Lusophones are in pretty much all dimensions related to ethnicity (genes, language, religion, "culture," — even the Brexit vote was, one could say, an attempt to emulate Norway's long-standingattitude toward the EU!). If the Economist's readers are more likely to be personally familiar with Portuguese spelling conventions than Norwegian ones are, perhaps because they find the Algarve a nicer place to vacation than the fjords, or find bossa nova more to their musical taste than black-metal, or prefer to drink port than aquavit, surely that's because they are comparatively cosmopolitan rather than ethnocentric, innit?
Christian Weisgerber said,

August 4, 2016 @ 9:42 am

Here in Germany I also keep noticing that in the media, say, French politicians get the proper diacritics but Polish ones don't. Partially this may indeed be due to technical difficulties like a Western European character set (hasn't everything switched to Unicode by now?) or what's available on the common keyboard layout. However, I think a big part is a linguistic/cultural/ethnic prestige incline across Europe. I'm oversimplifying here, but from a German POV it's basically this: everything west of Germany enjoys high prestige, everything to the east suffers from low prestige. It's also very visible in the foreign languages people study. Approximately nobody in Germany learns Polish or Czech. And while few people would admit to phrasing it that explicitly, I'm afraid there is a widespread prejudice that eastwards it's all gangsters, thieves, and hookers.

Turkish is special. On the one hand the language and its speakers are at the bottom rung in German society, on the other hand every newsroom probably has one or more reporters who are the children or grandchildren of Turkish emigrants and who have some facility in the language.
Avinor said,

August 4, 2016 @ 9:53 am

Every time I encounter a new name in one of the "non-blessed" languages while reading the Economist, I have to Google it to be sure of the pronunciation. This is quite an embarrassment for them, considering their high quality standards in general.

Deborah Pickett:

English speakers ought to have an easier time pronouncing Ð and Þ than mainland Scandinavians, since those sounds exist in English, but not in Swedish or Norwegian. (Danish has the sound of Ð as an allophone of /d/.)
J.W. Brewer said,

August 4, 2016 @ 9:58 am

I myself typically refer to the Latin-scripted form of Japanese as romaji rather than rōmaji, but Japanese should be fairly technically unchallenging in that the macron for doubled vowels is really the only diacritic (in Hepburn, at least) that you need, and it doesn't seem a particularly exotic one. But I'm not sure that English or other Western publications tend to bother with it — it only provides useful information about pronunciation if you're going to be more self-conscious about accurate original-language pronunciation of Japanese proper names than 99%+ of the readers of a non-Japanese newspaper are likely to be, and would only help with disambiguation if you already knew so much Japanese that you could get confused between two not-quite-homophones, which again seems highly unlikely for a non-Japanese readership. Indeed, wikipedia gives Tokyo as the spelling (in an English-prose context) of the city but Tōkyō as the transliteration of its Japanese name, which seems a helpful distinction to keep in mind. Separately, the implicit minor premise here that because Turkish is written in a script similar (but not identical) to the script used for English there is no legitimate need for "transliteration" of Turkish into English seems wrong. Now, it's certainly possible that a transliteration scheme that did something other than just treat all funny-looking letters like their closest visual equivalent in our script would be superior. E.g. when transliterating German into English, the traditional rendering of Müller as "Mueller" rather than "Muller" has something to be said for it. But there may not be enough felt demand to do the same for Turkish.
Zeppelin said,

August 4, 2016 @ 10:18 am

German journalists seem to have taken to pronouncing his name "Erdowan".

Vaguely related: A couple of times now I've seen the ė in Lithuanian names printed as é in German media. In fact I think I've only ever seen it printed wrong or the dot omitted.
Jake said,

August 4, 2016 @ 11:04 am

@JWBrewer: am I remembering correctly that with Japanese transliteration you can de-macronize with an extra vowel, like Tōkyō becomes Toukyou?
Brian said,

August 4, 2016 @ 11:24 am

Thanks for the perspective. It's so easy to forget how much influence little decisions about character sets made in the 1960s and 1970s wind up affecting how we talk about other countries.

A small point: Instead of "8-bit ASCII", the name "Latin-1" would be nicer to technical ears. There is no single 8-bit extension of ASCII — in fact there are about a dozen different extensions of ASCII called Latin-1, Latin-2, and so on, although Latin-1 is the only one that was ever in widespread use.
J.W. Brewer said,

August 4, 2016 @ 11:26 am

@Jake: I think that's the https://en.wikipedia.org/wiki/W%C4%81puro_r%C5%8Dmaji approach, although "Tohkyoh" and "Tookyoo" are also Out There via various alternative romanization protocols. As is "Tôkyô" if you're ok with diacritical marks in general but just don't like macrons.
J.W. Brewer said,

August 4, 2016 @ 11:29 am

And of course the now-decidedly-archaic-looking "Tokio," which the google books n-gram viewer says was more common than "Tokyo" in English-language texts for a few early Meiji-era decades until the trendlines crossed in the 1890's.
Roger Lustig said,

August 4, 2016 @ 11:34 am

In Poland, in my experience, the native speakers will pronounce the {Ł ł} as a "w"-ish sound when speaking Polish, but when they speak English, the letter is sounded as a plain old {L l}. (I wonder what happens when they speak English among themselves, or in English class?)

Among genealogists, it's conventional to write surnames (and only surnames) in ALL CAPS. This means trouble for the German sharp S (Esszet) {ß} when it appears in a surname, because there *is* no capital Esszet in German orthography. Me, I always resolve it to a double S, which is entirely correct spelling according to recent Dudens, etc.; and also use the trailing e instead of the umlaut, a) to make searching easier, b) to enable D-M Soundex, which doesn't recognize the modified vowels as vowels and c) to avoid ugliness with systems that still can't handle these (completely standard 8-bit) ASCII characters.

In other words, I don't think it's always ethno/linguocentricity; instead, what one's word-processing system and other technologies are designed to do, the frequency with which certain characters occur, and how the natives approach the modified characters in their own language. After all, in most languages that use our alphabet, the basic letters are called Latin or Roman script (or the appropriate cognate), and those letters actually found in Latin form a sort of core that other languages knowingly modify or add to for their own uses.
Alyssa said,

August 4, 2016 @ 1:11 pm

Even aside from technical limitations, this approach seems very practical to me. Loanwords into English usually lose their diacritics almost immediately, unless they come from a language that is well-known by English speakers (French, Spanish). Is it so wrong to treat proper names the same way? There doesn't seem much point in retaining diacritics that will be meaningless to the vast majority of your readers.
Coby Lubliner said,

August 4, 2016 @ 2:16 pm

There is more to Turkish orthography than diacritics: there is the matter of the dotted capital I (İ) and the undotted lower-case I (ı), which are almost invariably ignored the the Western press. Incidentally, I have always believed that these confusing letters were unnecessary — that Atatürk and his advisers would have been better off using Y,y instead of I,ı (as is done in some Central Asian Turkic alphabets), to use J,j for /j/ and not for /ʒ/, and to use z with cedilla for the latter (by analogy with ş).
Alex said,

August 4, 2016 @ 2:48 pm

By far the worst example of rendering foreign names in an English text that I have ever come across is historian Norman Davies' 2005 book 'Rising '44: The Battle for Warsaw', an account of the Warsaw uprising.

What could have been an excellent account of an important episode is ruined by the author's assumption that Polish names are far too exotic for an English reader to cope with. His solution is simply to get rid of them all entirely.

Christian names are translated into an English equivalent so Jerzy becomes George, Marek – Mark, Ludwik – Louis, Paweł – Paul, Józefa – Josephine, etc.

Names from Polish nouns or adjectives are calqued into English, thus Korwin – Crow, Grot – Arrow, etc. The same is done for place and street names, thus Nowy Świat is 'New World Street' in the text.

Not content with the choice of Lwów/L'viv/L'vov/Lemberg, he refers to the city throughout as 'Lvuv'. (It's lucky Łódź didn't feature in the narrative.)

Where he can't find an English calque he reduces names to an initial (Mark E., Henry S., Dr. K. etc). Otherwise he uses a phonetic transcription he has invented himself (Janusz – Yanush, Czesław – Cheslav). Even monosyllabic names which would not trouble any reader are not exempt, thus Jan Kott is 'Yan Ko.' (not to be confused with Jan Karski who is 'Yan K.').

At no point are actual Polish names mentioned in the 650 page text. For that the reader has to cross-reference a 15 page appendix. To further complicate matters, the correct Polish names are used in the index, not the made-up versions in the text.

Nazis and Soviets are all referred to by their correct names. Complicated military ranks are left in the original German. However, where a Pole has a German surname this is dumped into the category of 'strange Polish name' and treated as above.

It's impossible to keep track of who's who. How on earth Davies' editors at Macmillan agreed to this travesty is beyond my comprehension.

Overall, I found the author's treatment staggeringly insulting to the memory of the brave men and women who gave their lives fighting for their freedom.
Vic said,

August 4, 2016 @ 2:49 pm

I worked as a programmer on newspaper publishing systems from 1973-1989, and then worked for a large newspaper for 19 years.

When I started, the typesetters allowed for 256 characters in a font*. The character set was based on ASCII, with variations for the newspaper's needs, e,g., the typesetters for a Finnish newspaper would have characters required for Finnish as well as some Scandinavian languages. You could also have special fonts (e.g., "Pi Characters") containing different characters.

With those limitations, accented characters were mostly ignored at the papers I worked with in the U.S. and Canada.

At the newspaper I worked for, their rule was that accented characters, where available, were used in the entertainment section for proper names such as for artists and composers, but were not used elsewhere. The rule changed when they started carrying TV listings for a Spanish language station. The program "Los años perdidos" becomes something completely different if you use an "n" instead of an "ñ".

The character set they were using didn't include the ñ or Ñ, so two other characters had to be removed.

It's been quite some time since I had to work with the details of phototypesetters, so perhaps now they use Unicode and character sets are no longer an issue. However, as others mentioned above, newspapers often use their own special typefaces, and they won't necessarily contain "all" accented characters.

* Technically, not really a font, but rather a collection of glyphs which could be scaled and possibly styled (e.g., slanted to simulate italic)
Coby Lubliner said,

August 4, 2016 @ 3:38 pm

American media nowadays overcompensate in using diacritics in Spanish names. For example, the late Senator Dennis Chavez of New Mexico in listed in English Wikipedia as Chávez, even though he came from a family that had been living in the US long before the acute accent became standardized for Spanish surnames ending in -ez (which happened around 1900). If the city of Martinez, California, which was founded in 1849, ever decides to become Martínez, I am going to protest (it's my county seat).
Michael Rank said,

August 4, 2016 @ 3:39 pm

@Ben, you say “Ironic that the author uses the incorrect diacritic in Ceaușescu’s name. Romanian ș but Turkish ş.” I thought both languages use s-cedilla (and t-cedilla in Romanian), but apparently not although a quick glance on websites of newspapers of both countries suggests cedillas for both. Would you please elaborate.
And yes, the Guardian is remarkably good at diacritics/exotic characters in less familiar languages these days, including Turkish ı and Icelandic ð and þ. I don’t quite now how it does it since not many if any of its reporters/editors know those languages or are linguistic nerds.
Jarek Weckwerth said,

August 4, 2016 @ 3:49 pm

(Rant on.)

Newspapers and magazines doing this, I can understand. But, as a member of the editorial team of an academic linguistics (!) journal (Impact Factor and all), I've had to endure this from a renowned (as in, big league) publisher specializing (!) in linguistics (!).

The journal is based in a Slavic-speaking country, and as a result there is a fair bit of diacritics sprinkled across the papers. But the final PDFs are produced by us, so the renowned German (!) publisher only has to take care of four (yes, literally four) pages per issue. And yet we've had a number of instances of diacritics being misplaced.

The technical side: like the Economist and many others, they use a proprietary font. The diacritics have to be added in by hand. OK, I can still understand.

The ugly "ethnocentric" side (and that really is a wholly euphemistic misnomer!): When things get changed and moved around, and diacritics get misplaced, they are incapable of noticing and correcting this themselves. If the diacritic in the name of the journal's home city — which is a major city, and which they see all the time — gets moved to the wrong letter, they won't notice, despite producing the front matter every four months. Pfff!

(When we started working with them, I was already sufficiently cynical to expect this, so I never even put up a fight for the correct shapes of things like ł or ę. Wasted time.)

(Rant off, sorry. But this is the right place for this innit?)
Christian Weisgerber said,

August 4, 2016 @ 4:43 pm

@Alex:
Polish with its digraphs is badly affected. Speakers of (e.g.) German and English tend to fall into a state of what somebody pointedly called Konsonantenpanik (consonant panic) on seeing Polish names, assume that they have to be unpronounceable, and don't even try to get it right. A current example from sports would be the UFC's women's strawweight champion, Joanna Jędrzejczyk. Phonologically, the name poses little problem for English speakers, [jɛnˈdʒeɪtʃɪk] would be close enough, but nobody can be bothered with Polish letter–sound correspondences, so it's "Joanna Champion" or some other nickname.

An educated person may be expected to know how to pronounce French, but Polish doesn't enjoy that prestige. Try telling people that there are easy rules… yay, whatever, who cares.

@Michael Rank:
Strictly speaking, Romanian uses s and t with comma, rather than cedilla. However, as far as I can tell even the Romanians are confused about this. The typographical difference is small. It brings up the question about subtle differences between similar looking diacritics in different languages. For example, computer typesetting has completely annihilated the difference between umlaut and diaeresis, which are two historically different diacritics. (In the 1980s, one of my German teachers still insisted that the umlaut had to be written as a pair of strokes, not dots.) And in traditional typography the acute accent on French vowels is really differently angled than the acute on Polish consonants. Etc.
Michael Rank said,

August 4, 2016 @ 5:00 pm

@Christian Weisgerber
I see that on a Mac, in Edit → Special Characters, there is both ș and ş, and ţ and ț although they look more different in the Special Characters array than in “real life”, so to speak (I also note the existence of ṫ and ẗ, which languages use either of those, I wonder…)
Charles said,

August 4, 2016 @ 5:25 pm

Alyssa, that's a good point. When I'm reading and see a foreign name or word I don't know how to pronounce, I either ignore the letters/accents I don't know, or skip over it completely and remember it as "that funny name I don't know how to pronounce". Like "Erdoğan"; I have no idea how ğ is different from g. But French or Spanish, of course I mostly know how to pronounce it, nearly everybody who knows English natively does.

The only contentious bit is that name now differs in spelling and pronunciation from the original language. Big whoop. This happens even within English alone, like the British pronunciation of Oh-bah-ma as Oh-baa-ma demonstrates.
Jongseong Park said,

August 4, 2016 @ 6:03 pm

@Michael Rank
For Romanian, the diacritics on the s and t are supposed to be commas, not cedillas. But a series of disastrous technological implementations have led to the current mess, where the incorrect cedilla form is still common in both encoding and displayed form.
For more details, please read the link I posted earlier: "Romanian diacritic marks"
Roger Lustig said,

August 4, 2016 @ 6:19 pm

@Christian Weisgerber: umlauts were generally written/typeset as dots long before computer typesetting. The source of the diacritical mark is the letter e on its side; but my experience in school in Germany in the 1970s (not to mention my father's in the 1920s) suggests that dots were OK. And in casual writing, a single stroke, more or less horizontal, sufficed.

On the other hand, computer typesetting wasn't quite up to the Hungarian double-acute accent (or Hungarumlaut) and its difference from a plain dotted umlaut back in the late 70s. They say that The New Grove Dictionary of Music and Musicians was held up for two years because of such problems. In general, computer typesetting has made it far easier to get unusual characters, diacriticals, glyphs, etc. printed properly. (See under Knuth, D.E.)
phspaelti said,

August 5, 2016 @ 2:17 am

J.W. Brewer said,

Now, it's certainly possible that a transliteration scheme that did something other than just treat all funny-looking letters like their closest visual equivalent in our script would be superior. E.g. when transliterating German into English, the traditional rendering of Müller as "Mueller" rather than "Muller" has something to be said for it.

Maybe. But I can say from longterm personal experience, that the vast majority of people have no idea what to do when they come across "ae".
Nick Barnes said,

August 5, 2016 @ 2:43 am

No such thing as "8-bit ASCII"; ASCII is 7-bit; there's a whole slew of 8-bit extensions but none of them are ASCII. These days the relevant 8-bit character set, for any European publication not ready (still!) to jump to full Unicode, is ISO-8859-15.
RP said,

August 5, 2016 @ 3:29 am

Generally there are numerous handwriting styles and numerous typefaces and I know different ways of writing apostrophes, commas and Zs, and different ways of printing them, all of which are equally valid. And I know that a handwritten cedilla often looks different from a typewritten one. So – at the risk of being insensitive – why is it so important that a Romanian comma should not look like a cedilla, or a Turkish breve should not be a caron? It's just that I'd have expected these things to fall within the normal realm of acceptable variation. But presumably this stems from my ignorance as a non-speaker of those languages. Or maybe not, considering the Romanian authorities took years before they bothered complaining about the use of cedillas?
RP said,

August 5, 2016 @ 3:37 am

Thinking about it, perhaps a somewhat analogous error in English might be the use (largely by non-native speakers) of the backtick " ` " as an apostrophe (slanted like a backslash). I can tolerate many different styles of apostrophe, including vertical, forward-slanting, and various types of curves, but I don't regard a backtick as correct. Non-native speakers often don't understand why not and (contra the theory about Western prestige) are sometimes reluctant to abandon it if it's the most convenient key on their keyboard.
Bob Ladd said,

August 5, 2016 @ 3:48 am

Thanks for all the comments, everyone. Like some of the other commenters, I was unaware that the diacritic under S and T in Romanian is "supposed" to be a comma rather than a cedilla, and it appears that this confusion is pretty widespread (thanks for the link, Jeongsong Park!). Having grown up (a rather long while ago now) with a typewriter that didn't even have a key for the number one because you could just use lower-case L, I find the substitution of a cedilla for a comma somewhat more forgivable than leaving the whole thing out altogether, but clearly not everyone agrees.

As for my choice of "ethnocentric" to describe the phenomenon under discussion, I take J. W. Brewer's point (and to be honest, I was actually surprised to find that Portuguese is on the Economist's list of important languages) – but I had in mind the kind of "cultural prestige" that Christian Weisgerber talks about.
Thanks also for all the corrections on the terminology of fonts, ASCII, etc. As I said, I grew up with a typewriter. Specifically about caron vs. breve, I found the term "caron" used in connection with the Turkish ğ when I looked up how to code it in HTML. Just goes to show you can believe everything you read on the web.
Thomas Lumley said,

August 5, 2016 @ 4:19 am

Here in New Zealand, one of the official languages has diacritics that aren't in the 'Latin-1' code page: the macrons on long vowels in (the most common rendering[1] of) te reo Māori. While the printed newspapers can cope, their websites have more trouble. A couple of years ago, the New Zealand Herald had a story about the importance of macrons, which appears on their website with them left out.

[1] Tainui, one of the large North Island iwi, prefers using double vowels, eg "Maaori"
Jongseong Park said,

August 5, 2016 @ 4:42 am

@RP, handwriting of course allows for far more variation in the forms of letters and diacritics than print. And even in print, so-called display typefaces show great diversity in forms compared to text typefaces which are meant for longer running text and suitable for immersive reading.

The range of acceptable variation for text typefaces is considerably narrower compared to display typefaces, not to mention handwriting, especially when one considers traditional serif text typefaces. For example, in handwriting, the dots of the German umlauts and Swedish ä/ö can look like macrons or tildes, and this can be imitated in display typefaces. In text typefaces, however, only the two-dotted form is permitted.

To take another example, tradition has dictated that opening and closing quotation marks have distinct forms, roughly recalling 6 and 9 (at least in English). Now, because typewriters and computer keyboards allotted a single sign to stand for both opening and closing single quotes (called "dumb quotes"), we have developed software-level substitutions to convert them to "smart quotes" somewhere along the way. The reason we go through this trouble is because to use "dumb quotes" in the finished product would be considered typographically "wrong". The average reader might not even notice it, but for those who pay attention to typographic detail, it makes a world of difference.

Perhaps this is more obvious for some graphic substitutions than others. One egregious example I see far too often in Korean publications is the use of β for the German ß. I think most people will agree that this is simply incorrect. Because the β used in Korean typefaces tend to be slanted and are often designed quite differently from the Latin glyphs, such incorrect substitutions stand out like a sore thumb.

I would not look to the relevant national language authorities as necessarily having a clue when it comes to typography and encoding of the written language. You should see how the National Institute of the Korean Language uses incorrect graphic substitutions for punctuation symbols, e.g. the "much less-than" and "much greater-than" signs ≪ ≫ instead of the correct double-angle brackets 《》 on its page on punctuation rules in Korean on its website. The "much less-than" and "much greater-than" signs are defined by Unicode for use as mathematical operators, and are therefore designed as such in typefaces.
John Wells said,

August 5, 2016 @ 6:00 am

To be certain about Unicode symbols, including diacritics and their correct nomenclature, go to http://www.unicode.org/charts.There you can see that ğ is U+011F SMALL LATIN LETTER G WITH BREVE.
As a regular Guardian reader I have admired its recent tendency to get diacritics right.
John Wells said,

August 5, 2016 @ 6:04 am

To be certain about Unicode symbols, including diacritics and their correct nomenclature, go to http://www.unicode.org/charts. There you can see that ğ is U+011F SMALL LATIN LETTER G WITH BREVE.
As a regular Guardian reader I have admired its recent tendency to get diacritics right.
cliff arroyo said,

August 5, 2016 @ 7:09 am

What I noticed in Romania is a very…. haphazard approach to diacritics, even in official or semi-official signage.

Over the several times I've been there (first time in… 2009(?) most recently this year) the changeover from non-initial î to â seems to have resolved itself (there was a series of reforms that confused the two) but lots of signage is very hit and miss with diacritics, including some and omitting others.

Here's an example form gara de nord (main train station)

http://www.journeyswithjay.com/wp-content/uploads/2016/05/bucharest-meridian-taxi-sign-2-1024×768.jpg

While they get the ț and ă right, they omit the comma under ș in the second word. That kind of thing is all over the place without much rhyme or reason as to what's included and what's omitted.
V said,

August 5, 2016 @ 7:31 am

"(In the 1980s, one of my German teachers still insisted that the umlaut had to be written as a pair of strokes, not dots.)"

A French friend of mine who has a bar always writes, on the chalkboards, umlauts as strokes and diareses as dots.
V said,

August 5, 2016 @ 7:51 am

By the way, he's in his late 30's.

"The source of the diacritical mark is the letter e on its side; but my experience in school in Germany in the 1970s (not to mention my father's in the 1920s) suggests that dots were OK."

I recall seeing it as late as the 90's as a small e on top; I think on a brass sign with a brand name, but it was probably an antique.
RP said,

August 5, 2016 @ 8:09 am

Excellent points by Jongseong Park, especially about the greater variation allowed in handwriting versus printed material. On the other hand, most of us can't be expected to be experts in typography, so the fact that something is typographically incorrect (or would not be accepted by a professional typographer) does not necessarily mean it is wrong in an absolute sense (if there is such a thing), so there might be some grey areas there, especially if we take a somewhat descriptivist approach.
John Roth said,

August 5, 2016 @ 8:53 am

Yeah, it's a muddled mess. Even for a publishing system that uses Unicode internally, you've got keyboard limitations on the front, font problems on the back and confused workers in the middle. I presume the Guardian has an automated spelling checker and corrector for names somewhere in their process.

With print at least the publication controls the font used; with online it's either at the mercy of what the viewer has installed on cis machine, or it has to download its own custom font, which wastes the viewer's bandwidth.
V said,

August 5, 2016 @ 10:58 am

More precisely, he's 36 but he says he had a old teacher and deliberately insist on writing them as strokes.
David Douglas ROBERTSON PhD said,

August 5, 2016 @ 11:50 am

The Uzbeks have cannily gone a West-compatible route that has the effect of avoiding any such issues as "Erdogan". When I encounter the Uzbek language lately it seems always to be written in the diacriticless O'zbek alifbosi: http://iub.edu/~celcar/alphabets/Uzbek_Alphabet.pdf
Alex said,

August 5, 2016 @ 1:35 pm

When I learned Spanish, we were taught that you don't put diacritical marks on capital letters– so "él," but not "Él." I am sure this was due to typesetting constraints. But now that word processors can handle it easily, does the rule still apply? Do publications like the Economist put diacritical marks on capitals?
J.W. Brewer said,

August 5, 2016 @ 1:46 pm

It is almost certainly sheer coincidence that one of the few modern European languages to eschew diacriticals almost entirely (English) has become the most dominant worldwide, i.e. orthographic/typographic convenience wasn't a key factor in increasing its market share as against its rivals. But that having happened makes it even easier to treat no-diacriticals as the default/unmarked state of typography. It seems in hindsight rather a pity that Middle English orthography did not hold on to its distinctive local glyphs like thorn and yogh (apparently due to the pro-ASCII biases of the scribal class?) so they could be carried forward into the post-Gutenberg world, because perhaps a world-dominating English that had obvious peculiarities in its own home set of glyphs would be more likely to accommodate the peculiarities of others.
oulenz said,

August 5, 2016 @ 2:46 pm

@Roger Lustig: I was under the impression that, in handwriting, a single horizontal stroke is used on the umlautless letters, precisely to distinguish them from the letters with umlauts.
Bob Ladd said,

August 5, 2016 @ 3:51 pm

@oulenz: No, the stroke is used in German handwriting only over the lowercase U, to distinguish it from lowercase N. Especially in the older (pre-war) handwriting the two letters were often otherwise identical.
Not a naive speaker said,

August 5, 2016 @ 5:05 pm

@oulenz

with the Deutsche Kurrent it was mandatory to have a stroke over the u. Some people used this with the "new" handwriting" out of habit and with a sloppy handwriting it helps the reader.
Guardian subeditor said,

August 5, 2016 @ 6:27 pm

@Michael Rank: 'I don’t quite know how [the Guardian gets the diacritics right] since not many if any of its reporters/editors know those languages or are linguistic nerds.'

@John Roth: 'I presume the Guardian has an automated spelling checker and corrector for names somewhere in their process.'

Well, there's an automated spellcheck that highlights unknown spellings, which means it highlights almost every proper name (including English ones), but it certainly doesn't magically correct them. Perhaps surprisingly the Guardian, like many other publications, still has human subeditors and it is part of our job to check and correct the spellings of names.

I don't know how many reporters or editors are linguistic nerds, but more than a few of the subs are (some of us even follow Language Log).

If we sometimes get names/diacritics wrong, that's because we are flawed humans and occasionally there is too much news happening too fast to check everything as well as we'd like to.

But many thanks to all who complimented the Guardian's record on diacritics! It's nice that someone notices these things. Names that involve transliteration from eg Arabic or Cyrillic are a whole nother story …
oulenz said,

August 5, 2016 @ 8:02 pm

@Bob Ladd, @Not a naive speaker: ah yes, it was specifically u I was thinking of, and contrast with n makes sense. But the point stands that a u with a horizontal stroke above it is not an ü.
Levantine said,

August 5, 2016 @ 9:16 pm

Zeppelin, that's a far better approximation of the original Turkish than the pronunciation with a hard G is.
Paul Clapham said,

August 5, 2016 @ 10:34 pm

For an example of a publication which deals with diacritic marks in an exemplary way, you could look at BBC Music magazine, a British monthly magazine about classical music. The latest issue has an article about the Lithuanian conductor Mirga Gražintė-Tyla, for example. And they never fail to spell the name of Jiři Bělohlávek correctly, and even deep in the reviews in the back of the magazine you'll still find the accent on Bartók's name.

Although I'm not sure whether they use s-with-comma or s-with-cedilla in Romanian names — I can't find any examples of either.
Thomas Rees said,

August 6, 2016 @ 12:48 am

@Alex: Yes, diacritics are now required on uppercase letters in Spanish. Los Ángeles; PINGÜINO. See the Ortografía de la lengua española (2010) cap. IV § 3.3, where the ‘Información adicional’ makes precisely the points you did about technological constraints.

Online, Wikilengua says
En el caso de la escritura en mayúsculas, deben aplicarse siempre las reglas de acentuación, al igual que con las minúsculas, ya sea en mayúscula inicial o en texto completo en mayúsculas.
Michael Rank said,

August 6, 2016 @ 5:09 am

@Paul Clapham: Glad to hear BBC Music magazine is so meticulous with its diacritics but the BBC News website is totally hopeless, omits pretty much all diacritics, even in French/German.
Ari Corcoran said,

August 6, 2016 @ 1:43 pm

Australian Aboriginal languages, for the most part, have avoided use of diacritics, at least in modern/contemporary orthographies, though early versions of Arrernte had a few doozies. For English speakers/readers contemporary non-Arandic readers of Arandic orthographies have difficulties, but the rules are relatively easy to learn ( see http://www.clc.org.au/articles/info/have-you-ever-wondered-why-arrernte-is-spelt-the-way-it-is/).

The only surviving diacritic I am aware of now commonly used is in Pitjantjatjara where the letters r and t (can't do it here) are underlined to indicate retroflexions.

But there are oddities, e.g. "rd" at the beginnings of words in Warlpiri (indicating a retroflexive): it had me buggered at first. As well, there is no universal approach to the "ng" sound, as in "sing" (in English). This is because the "ng" sound in many languages is often immediately followed by a hard "g" or "k" sound (their is an elision between these voiced and unvoiced sounds in many languages). Yolngu Matha linguists, in at least a couple of fonts, have designed an "Ng" where the two Roman letters are morphed, but this is not entirely convenient. So for the most part we end up with "nng" or "ngk".
tangent said,

August 6, 2016 @ 10:07 pm

> German journalists seem to have taken to pronouncing his name "Erdowan".

That's roughly correct, as far as https://en.wikipedia.org/wiki/Ğ tells me:
"The letter provides a smooth transition between vowels since they do not occur consecutively in native Turkish words (in loanwords they are separated by a glottal stop)", it "adds […] a /β/ glide to the rounded vowels /o/ […]"
Levantine said,

August 6, 2016 @ 10:26 pm

Wikipedia is wrong in asserting that consecutive vowels in loanwords are separated by a glottal stop. For example, "maalesef" (from Arabic) simply sees the two As merged into a long vowel, and many Turks mistakenly spell the word "malesef".
Bob Ladd said,

August 7, 2016 @ 4:05 am

PS to original post: In the most recent issue of the Economist (6 Aug, p 28 of UK edition), in an article about tensions related to the current situation in Turkey among Turkish residents of Germany, the very same Cem Özdemir I talked about in my original post is mentioned twice, both times without the umlaut.
David Marjanović said,

August 7, 2016 @ 4:44 pm

In the 1980s, one of my German teachers still insisted that the umlaut had to be written as a pair of strokes, not dots.

In handwriting and in a total of one printed font I've ever seen.
mollymooly said,

August 8, 2016 @ 6:31 am

The Economist had an article last April about the Czech Republic changing its name to Czechia; the online version currently has the line "Czechia can be seen as a literal translation of Cesko" but the print edition had something like "Czechia can be seen as a literal translation of $Áesko" instead of "Czechia can be seen as a literal translation of Česko". The following week's print issue had a correction that said something like "Last week's issue misprinted the Czech name for the Czech Republic due to a technical error" but did not give the correct name, presumably for fear of making a similar mistake again.
mollymooly said,

August 8, 2016 @ 6:44 am

In the Irish language, the long sign áéíóúÁÉÍÓÚ appears on capitals. This was always the case in manuscript and Gaelic type, and in Irish-language texts printed in Latin type; but when an Irish word was printed in Latin type in isolation (e.g. on a sign or as a single word in an otherwise English-language text) the usual practice was to omit the accent. This stopped after 1970s postage stamps were written with EIRE ("burden") instead of ÉIRE ("Ireland").
Meaghan Fowlie said,

August 8, 2016 @ 12:08 pm

I'm with Alyssa and Charles: I suspect that the main reason for the choice of mainly French, German, and Spanish diacritics is that many readers know at least approximately what they mean, but most haven't a clue about any of the others (with the possible exception of ø (Scandanavian slashed O if you can't see it ) — isn't it pronounced basically as it is in IPA? And don't people know that?).

Mind you, the reasons for this probably include all sorts of ethnocentric unpleasantnesses. And I don't take it as an excuse: they should totally be including the diacritics for all names in all languages if they can. It strikes me as polite and accurate, and helps people find out how to pronounce them properly if they're so inclined.
leonie cornips said,

August 15, 2016 @ 1:39 am

I'm writing columns for a Dutch provincial newspaper which is the fourth largest of the Netherlands. I cannot use the diacritics in Polish, neither in Turkish if I write about those language. The explanation is that everything which is not available on qwerty will not be printed. However, the local dialects are codified such that they contain lots and lots of diacritics and no problems with printing that. But all these diacritics fit in qwerty and the previous (electronic) type machine font.

RSS feed for comments on this post

Erdoğan

69 Comments

Bathrobe said,

Ari Corcoran said,

Alon Lischinsky said,

Jongseong Park said,

Deborah Pickett said,

Ben said,

Jayarava said,

Alex said,

RP said,

SP said,

Tim May said,

J.W. Brewer said,

Christian Weisgerber said,

Avinor said,

J.W. Brewer said,

Zeppelin said,

Jake said,

Brian said,

J.W. Brewer said,

J.W. Brewer said,

Roger Lustig said,

Alyssa said,

Coby Lubliner said,

Alex said,

Vic said,

Coby Lubliner said,

Michael Rank said,

Jarek Weckwerth said,

Christian Weisgerber said,

Michael Rank said,

Charles said,

Jongseong Park said,

Roger Lustig said,

phspaelti said,

Nick Barnes said,

RP said,

RP said,

Bob Ladd said,

Thomas Lumley said,

Jongseong Park said,

John Wells said,

John Wells said,

cliff arroyo said,

V said,

V said,

RP said,

John Roth said,

V said,

David Douglas ROBERTSON PhD said,

Alex said,

J.W. Brewer said,

oulenz said,

Bob Ladd said,

Not a naive speaker said,

Guardian subeditor said,

oulenz said,

Levantine said,

Paul Clapham said,

Thomas Rees said,

Michael Rank said,

Ari Corcoran said,

tangent said,

Levantine said,

Bob Ladd said,

David Marjanović said,

mollymooly said,

mollymooly said,

Meaghan Fowlie said,

leonie cornips said,

Follow us on Twitter

Archives [+/–]

Blogroll [+/–]

Meta