The inevitability (or not) of diacritical marks

« previous post | next post »

Recent talk at the University of Pennsylvania:

"Printers’ Devices, or, How French Got Its Accents"
Katie Chenoweth, Princeton University
Monday, 22 October 2018 – 5:15 PM
Van Pelt-Dietrich Library Class of 1978 Pavilion in the Kislak Center, University of Pennsylvania
Sponsored by: Penn Libraries

In 1529, the French typographer and future *imprimeur du roy* (royal printer) Geoffroy Tory lamented that the French language had no accents. “In our French language,” he writes, “we have no figures of accent in writing […].  I wish that it were so.”  Just two years later, all five of the accents found in modern French had appeared in vernacular printed books. This talk explores the role of printers and printing technology in the introduction of accents in French during this remarkable period around 1530.  My claim will be that one of the primary roles accents serve for French printers is the reproduction of a “native” vernacular voice on the page, allowing them to deploy typography as a *phonographic* apparatus. Challenging longstanding assumptions about the turn to the visual and spatial occasioned by printing technology, I suggest that printing brings about a phonetic turn and a new emphasis on the auditory that will ultimately drive the “rise” of French as a national idiom.

This comes on the heels of two posts on diacritics (or not) in Vietnamese:

"Diacriticless Vietnamese on a sign in San Francisco" (9/30/18)

"Vietnamese nail shop" (10/21/18)

This is all the more thought provoking in light of the fact that Vietnamese diacriticals were evidently at least partially inspired by or modeled on French diacritical marks.  Or so I thought.

Steve O'Harrow writes:

The diacritical marks in today's quốc ngữ writing system have evolved from the original romanization carried out by the Catholic Missionaries that went to Viet Nam under the patronage of the King of Portugal in the early-mid 17th century and the letters & marks they employed are based on 17th century Portuguese and the extent to which they may resemble French is mis-leading. Some of the marks (e.g., ~) are used in ways different to how they were used in Portuguese, but that too has evolved. In the event, the French were "Johnny-Come-Lately" to the romanization party.

Be that as it may, Vietnamese diacritical marks appear to have been inspired by diacritical marks for Romance languages.  As we have seen, in the case of French, pace Katie Chenoweth, they are linguistically adventitious, though politically potent.  Superficial descriptions of the appearance of Vietnamese alphabetical writing frequently emphasize the abundance of diacritics, and one often encounters statements such as this:  "The many diacritics, often two on the same vowel, make written Vietnamese easily recognizable."  Thus we may say that the plethora of diacritical marks is a distinctive feature of Vietnamese alphabetical writing.

Steve adds:

The term "diacritical marks" is somewhat mis-leading. From the emic Vietnamese point of view, there are five diacritical marks, e.g.:

mà má mả mã mạ

These marks indicate 5 out of the 6 possible segmental tones that a syllable can have. if a syllable does not carry one of these marks it is said to be on a "level" tone (actually slightly sustained then trailing in a light downward direction). All the other marks are not seen as being "added," but rather integral parts of a specific letter. In that way, for example, the following are simply seen as separate letters/vowels:

a ă â e ê i o ô ơ u ư y

so that, again for example ô is not perceived as o + circumflex, but simply as ô. And because tone mark diacritics are added over* vowels, it appears to someone who is not literate in the Vietnamese language as "double diacritics."


*with the exception of the "nặng" which comes below the vowel

I asked another colleague who is a Vietnamese language specialist what the maximum number of diacritical marks a single letter can have in Vietnamese is and how many diacritical marks a letter usually has.  He replied:

Zero to two.  I don't know that anyone has looked at "average" numbers but a cursory view of any Vietnamese text shows that most syllables have some kind of non-western ("diacritical") marking.

There are two classes of diacritics: one to distinguish additional letters regardless of tone, and another class to indicate tone.

The one class of diacritic may appear over a vowel, because the orthography distinguishes more vowels than the western alphabet has vowel letters, so extra markers are needed. E.g., a, ă, â . In these examples, tone is not indicated, since they are meant to be read in the first (or level) tone, unmarked in the orthography.  They reference segmentally different sounds.

There are other "marked" vowels, like ư and ơ distinct from u and o.  But the "extra" strokes are not considered diacritics because they are connected directly to the western letter, even though their purpose is the same as the marks on top of ă and â , i.e., to distinguish segmentally distinct sounds.

The second, entirely different class of diacritic is used to indicate tones 2 through 6.  There are five of these marked tones, and they are written above or under the syllable's same nuclear vowel, to indicate one of the non-level tones.  If it happens that one of these vowels is already marked with a non-tonal diacritic, the vowel ends up with two diacritics.  E.g., ầ*.


*[VHM:  this letter may not appear correctly.  It is meant to be a grave accent over a circumflex above an "a".

The standard set of letters in the modern Latin alphabet includes 26 letters, with some languages using fewer and others more.  Some languages supplement the basic Latin alphabet with various accented letters, ligatures, and extra letters (see Omniglot, "Latin alphabet").  Another way to increase the number of distinct orthographic forms is through digraphs or digrams.  It is even possible to represent tones with letters of the alphabet, as in Gwoyeu Romatzyh (Hanyu Pinyin Guóyǔ Luómǎzì) 國語羅馬字 / 国语罗马字 ("National Language Romanization").

[Thanks to Michele Thompson]


  1. Michele Sharik said,

    October 23, 2018 @ 5:22 pm

    “There are other "marked" vowels, like ư and ơ distinct from u and o. But the "extra" strokes are not considered diacritics because they are connected directly to the western letter”

    So by that logic, ç is not a c with a diacritical mark? Or is that different because ç could effectively be replaced by ce?

  2. AntC said,

    October 23, 2018 @ 7:00 pm

    @Michele ç is not a c with a diacritical mark?

    The "logic" is talking about Vietnamese alphabetisation, not French. Never the less, the accents in French pronunciation and therefore orthography are a symptom of where etymologically a consonant has been lost but left a trace that affects the vowel quality: hôtel from hostel.

    Whereas English tended to leave all sorts of letters lying around, but (some) became 'silent': all the pronunciations for -ough. As for Welsh and Gaelic spelling …

  3. David Marjanović said,

    October 23, 2018 @ 8:36 pm

    Welsh uses a bunch of digraphs, but no silent letters to my knowledge. In all varieties of Gaelic, however, lots of letters are diacritical (i.e. vowel letters used only to indicate, often redundantly, whether the adjacent consonant is palatalized or velarized), and many others are completely silent (e.g. the -idh endings that were reformed away in Irish a few decades ago but are kept in Scottish Gaelic).

  4. Andreas Johansson said,

    October 24, 2018 @ 12:00 am

    It's weirdly anglocentric to consider the bare letters "western" and diacritics, by implication, non-western. Most European languages use some form of diacritics, albeit not to the extent of Vietnamese.

  5. Victor Mair said,

    October 24, 2018 @ 12:26 am

    From Michele Thompson:

    I am pretty sure that the maximum number of diacritical marks on a Vietnamese letter is 2, one as a vowel mark (I’m sure there is a technical term for this) and one as a tone mark.

  6. Victor Mair said,

    October 24, 2018 @ 12:27 am

    From Nguyen Ngoc Hung:

    I think Michele is right that there are two different diacritical marks:

    1. Tone marks (there are 5 tone marks: à, á, ạ, ả, ã): help change the tone ò the words

    2. Vowal markers: Mark different vowels
    – a, ắ, â
    – o, ô, ơ
    – u, ư
    – e, ê

  7. Victor Mair said,

    October 24, 2018 @ 12:29 am

    From Steve O'Harrow:

    That's another way of looking at it, too.
    I would maintain that the vowels with marks
    are simply different letters of the alphabet &
    when Vietnamese recite their alphabet, they
    have separate names for them in sequence.
    Also, I believe "ắ" here should just be "ă"
    as it appears in the alphabet, without a rising
    tone mark, even though when recited, these
    two "ă" and "â" are pronounced on a rising
    tone to distinguish their names from "a" & "ơ"
    The rising tone in the names is to remind us
    that these are short vowels (while "a" & "ơ"
    are the phonemic long equivalents.

    There's my 2 đồng's worth . . .

  8. AntC said,

    October 24, 2018 @ 12:40 am

    It's weirdly anglocentric …

    Hmm? This alphabet is Latin (from Greek, via rather a lot of places). There's a few letters been added for English (and other European languages): perhaps 'j' or 'w' are already diacritics added to Latin? Perhaps sh/th/ch/ph/gh are digraphs added to Latin?

    What would be more Anglocentric would be to persist with the (much more suited) Old English Eth, Thorn, etc.

    I blame the Normans, not the Anglos.

  9. DD.Owen said,

    October 24, 2018 @ 4:56 am

    David Marjanović: 'Welsh uses a bunch of digraphs, but no silent letters to my knowledge.'

    This is correct. Welsh might *look* like it has a lot of redundancy of that kind, but with a smaller alphabet than English, albeit one that includes digraphs, and a relatively more phonetic pronunciation system, it in fact doesn't.

    It's worth noting that jokes about Welsh being 'nothing but a bunch of consonants' are a bit of a sore point amongst Cymrophones, in part because they're the kind of thing ceased being funny at around the 10,000th repetition fifty years ago, and also in part because the Welsh alphabet allows for seven vowels compared to the English five (or six for some values of 'y').

  10. ~flow said,

    October 24, 2018 @ 5:11 am

    I guess it does make sense to distinguish base letters and diacritics in a purely typographic sense, and in that sense ă â ê ô ơ ư and đ are a, a, o, o, u, and d with an added mark, even when at the same time each of these are basic letters of Vietnamese orthography. Typography is not orthography, and each have their own, though overlapping taxonomies.

    When we get into details things get more complicated: G is originally a C with an added hook, much like later Ç became C with another kind of hook. Yet G is most of the time considered a letter of its own, whereas Ç as a C with a cedilla below. The dot over lower case i is not considered a diacritic although historically it originated as one. J is considered a letter of its own and distinct from I although historically it is just an I with a tail below; in 16th century Europe, one would recite 23, not 26 letters of the ABC, since JUW, while already in use, were considered variants of I and V. So views on this matter can certainly change, and conflicting views may co-exist.

    I would also like to offer the view that the hook in ơ and ư and the circumflex in â ê ô are indeed, originally, systematic markers with phonetic interpretations. tells me that, simplifying a lot, e is /ɛ/, ê is /e/; o is /ɔ/, ô is /o/, so it does look like the circumflex gives a more closed variant of the respective vowel. Not sure about â though. Likewise, ơ is /ə/ and ư is /ɨ/, so there's definitely the possibility that the creators of this orthography did, maybe unwittingly, conceptualize the hook as indicating the feature [-round] (by which I mean 'the same vowel but with lips spread, not protruded/rounded'). Similar things may be said about the German use of äöü in their relation to the non-diactrical forms aou. Speaking of which, the umlauts of German are often considered both letters in their own right and aou with umlauts, and the dots are not considered Akzente, because 'only French has those'.

  11. Rodger C said,

    October 24, 2018 @ 6:45 am

    And Ç, I believe, originated in Visigothic Spain as a Z with a swash at the top.

  12. ~flow said,

    October 24, 2018 @ 6:56 am

    @Rodger C "Ç, I believe, originated in Visigothic Spain as a Z with a swash at the top"—That's what I read, though I was first unsure whether it wasn't really a c and a z stacked on top of each other.

  13. Tom Dawkes said,

    October 24, 2018 @ 6:56 am

    You might like to see this essay:

    André-Georges Haudricourt. The origin of the peculiarities of the Vietnamese alphabet. Mon-Khmer Studies, 2010, 39, pp.89-104. HAL Id: halshs-00918824
    Originally published as: L’origine des particularités de l’alphabet
    vietnamien, Dân Việt Nam 3:61-68, 1949.

  14. david said,

    October 24, 2018 @ 6:59 am

    Could the Vietnamese favoring of diacritics reflect an old sanskrit influence as in Pāṇini?

  15. david said,

    October 24, 2018 @ 7:01 am

    Could the Vietnamese favoring of diacritics reflect an old sanskrit influence as in Pāṇini?

    (Why is this triggering the duplicate comment message?)

  16. Tom Dawkes said,

    October 24, 2018 @ 7:03 am

    On Welsh as loaded with consonants: the problem for English speakers is that W is seen only as a consonant and Y is also regarded — erroneously — as only a consonant, whereas in form it is historically derived from both consonant and vowel letters. (Some years back, the two very popular UK quiz shows, University Challenge and Only connect , had IN THE SAME WEEK questions which had the answer "syzygy" as a word without vowels!)
    English speakers seem to be extremely naive about the limits of the current alphabet in conveying phonemic distinctions.

  17. David Marjanović said,

    October 24, 2018 @ 10:53 am

    in 16th century Europe, one would recite 23, not 26 letters of the ABC, since JUW, while already in use, were considered variants of I and V.

    The library catalog of the University of Vienna only completed this change in the early 1970s.

    Speaking of which, the umlauts of German are often considered both letters in their own right and aou with umlauts

    It's complicated. Like ß, they have never been given places in the alphabet, which is recited with the usual 26 letters. In alphabetical ordering, they used to be most commonly treated as ae, oe, ue, while nowadays they're mostly treated as a, o, u for this purpose – but in crosswords they're still ae, oe, ue, going in two boxes each. Yet, they're not counted as digraphs in response to the question "how many letters does this word have". They have keys on the keyboard layout and are generally used in official documents (where ß can run into trouble because it doesn't have an uppercase version – a slot for that was created in Unicode just a few years ago).

    Could the Vietnamese favoring of diacritics reflect an old sanskrit influence as in Pāṇini?

    Would surprise me, because it's a creation by Catholic missionaries straight from Europe. I haven't had time to read Haudricourt's paper yet, but the long abstract doesn't mention Sanskrit or Devanāgarī at all.

    The Latin transliteration of Sanskrit with its diacritics only dates from the 18th or 19th century, making it much younger than quốc ngữ.

  18. David Marjanović said,

    October 24, 2018 @ 10:56 am

    Also, Vietnamese is (and has been for a very long time) in the Chinese sphere of cultural influence, not the Indian one as the rest of Indochina is.

  19. cliff arroyo said,

    October 24, 2018 @ 1:18 pm

    "Haudricourt's paper yet, but the long abstract doesn't mention Sanskrit or Devanāgarī at all."

    Some years ago in Vietnamese class the teacher showed a documentary on temples (esp in South Vietnam). Many had inscriptions in Khmer (or some related script) and I asked if Vietnamese had ever been written in a similar script (which seems at least possible if not that likely). The teacher was very adamant that no, just Chinese characters and then quoc ngu…

  20. Kate Bunting said,

    October 26, 2018 @ 8:53 am

    The Scandinavians regard their modified vowels as distinct letters added to the end of the alphabet (Norwegian æ, å and ø, Swedish ä, å and ö – not sure about Danish).

  21. cliff arroyo said,

    October 27, 2018 @ 3:04 am

    One of the most interesting cases for diacritics in Europe is Romania, which has five ă â î ş ţ
    In practice, outside of professionally made publications, usage is very haphazard and even semi-official signage tends to include some and omit others on what seems like an almost random basis.
    This sign from Gara de nord (in Bucharest) has si instead of şi, for example.

RSS feed for comments on this post