On May 28, 2013, I made the following post: "Vietnamese in Chinese and Nom characters". The discussion that followed, as usual at Language Log, was lively and informative, and raised a lot of very interesting issues concerning the history and nature of Nom and its relationship to Chinese characters and Chinese languages.

John Balaban had wanted to participate in that discussion, but was delayed by heart surgery (he's all right now), and has taken the first opportunity to send in these remarks, which help us to understand why many people, including some of my own students and colleagues, still care passionately about this unique writing system.

[begin remarks by John Balaban]

Fourteen years ago I helped found the Vietnamese Nôm Preservation Foundation which has as its main goal the digital preservation of texts–manuscripts, inscriptions, and woodblocks–in the 1,000-year Nôm tradition. We have created Nôm fonts for the ISO/Unicode Standard and we have digitized 2000+ Hán-Nôm texts (the two scripts are often mixed) from the collection of the National Library of Vietnam, (please see this site).

I do not have the skills to answer directly the many wise as well as the few scatter-shot comments here, but wish to refer those interested to a blog regarding the late scholar Nguyễn Tài Cẩn.

If you go to the footnotes, you will find ample scholarly sources on the structural nature of chữ Nôm. But you would have to be able to read modern Vietnamese. This blog is authored by a brilliant young scholar in Hanoi, Nguyễn Tuấn Cường.

For those who read English primarily, there's an interesting recent discussion by the linguist John Phan available on the internet. The title is "Chữ Nôm and the Taming of the South" in the Journal of Vietnamese Studies, Vol. 8, No. 1 (June 2013), pp. 1-33.

Let me add an anecdote: Years ago, when the Ideographic Rapporteur Group (IRG) was deciding which historical East Asian glyphs represented independent writing systems, Nôm was at first rejected. Then, at the very site of the Institute of Hán-Nôm Research mentioned in this line of discussion, the Chinese members of the IRG panel were confounded by the following poem, inscribed in Nôm on a huge marble wall (see photograph below), and commemorating the Vietnamese defeat of a Chinese army in 1789.

打 朱 底
打 朱 底 顛
打 朱 伮 隻 輪 不 返
打 朱 伮 片 甲 不 還
打 朱 使 知 南 國 英 雄 之 有 主
光 中 阮 惠

Đánh cho để dài tóc
Đánh cho để đen răng
Đánh cho nó chích luân bất phản
Đánh cho nó phiến giáp bất hoàn
Đánh cho sử tri Nam Quốc anh hùng
chi hữu chủ.
–Quang Trung, Nguyễn Huệ

We beat you because we like to wear our hair long.
Beat you because we like to blacken our teeth.
Beat you, so none of your war chariots could run off.
Beat you to keep your weapons from going home.
Beat you so history knows the South has its own king.
–Nguyễn Huệ, Emperor Quan Trung, 1789

To their credit, the Chinese scholars changed their vote and recognized a writing system representing Vietnamese speech, but employing the habits of what corresponders here are calling "classical Chinese."

Please take a look at our website. We are a 501c3 public charity and any help is absolutely welcomed. I would be happy to answer any questions.

Here is a photo of the poem on the marble wall mentioned above.



  1. mondain said,

    July 16, 2013 @ 2:13 am

    The first two lines in unicode: 打朱底

  2. mondain said,

    July 16, 2013 @ 2:16 am

    It seems the comment program refuse to accept those characters.

  3. Frédéric Grosshans said,

    July 16, 2013 @ 3:14 am

    @mondain: I read these characters, and they look like the ones on the photographs. I guess it's a font problem on your side.

  4. Alon Lischinsky said,

    July 16, 2013 @ 4:04 am

    @Frédéric Grosshans: mondain got the first three characters past the WordPress filter, not the entire first two lines. And it's not a font problem: there's nothing on the comment source beyond those three characters. So there is likely to be an encoding issue somewhere.

  5. minus273 said,

    July 16, 2013 @ 6:58 am

    The "隻輪不返", "片甲不還" and "使知南國英雄之有主" parts are quite Chinese though.

  6. bfwebster said,

    July 16, 2013 @ 11:34 am

    Great poem, by the way.

  7. JS said,

    July 16, 2013 @ 2:57 pm

    Very nice… though the translation falls short at 打朱伮隻輪不返/打朱伮片甲不還. My question is whether đánh 打 is really Sino-Vietnamese (as would be the normal assumption) given that dǐng(?) 打 'strike' looks so marginal in Sinitic… Vietnamese word/graph?

  8. iwsfutcmd said,

    July 16, 2013 @ 4:00 pm

    @Alon Lischinsky: The characters aren't part of the BMP (the basic multilingual plane of Unicode), they're part of the SMP – the supplementary multilingual plane. The BMP is for living scripts, the SMP is for scripts not in current use, so characters specific to Chữ Nôm are generally encoded in the SMP. Because the SMP requires some special wrangling to use, it probably makes the comment system screw up.

    And for the record, it appears that the last character in the second line (http://nomfoundation.org/common/nom_details.php?codepoint=60722&img=1) isn't actually encoded in Unicode yet (at least according to Mr. Balaban's Vietnamese Nom Preservation Foundation), so it obviously wouldn't paste in properly.

  9. Victor Mair said,

    July 16, 2013 @ 4:30 pm

    I don't know how much of this will come through in the comments section of WordPress, but here is fairly complete information on the last character in the second line. It comes from John Balaban:


    It is răng, the Vietnamese word for teeth. It doesn't appear to have been presented to Unicode yet. (Among the cultural insults felt by the Vietnamese when the Chnese first took over were issues about how to wear their hair and the blackening of teeth).

    Detailed information
    Glyph �
    Temporary Code * V+6331b
    Unicode radical
    + strokes radical 0211
    + 8 strokes
    Radical + strokes radical 0211
    + 8 strokes
    Radical xỉ
    Vietnamese răng, như "mọc răng, hàm răng" (gdhn)

    *Note: Temporary code represents the character not currently in
    Unicode. We will submit this character together with those
    collected by other Nôm scholars to Unicode at its next meeting.

    It appears to be constructed on this basis:

    strokes radical glyph unicode name definition
    8 0211 齿 U+9f7f xỉ teeth; gears, cogs; age; simplified form of the KangXi radical number 211
    15 0211 齒 U+9f52 xỉ teeth; gears, cogs; age; KangXi radical 211

    We have a Nom look-up tool on our website which gave me this:

    quốc ngữ nôm codepoint radical strokes context ref english mandarin cantonese
    răng V+6331b 0211 xỉ 8 mọc răng, hàm răng gdhn

  10. Jean-Michel said,

    July 17, 2013 @ 3:55 am

    Very nice… though the translation falls short at 打朱伮隻輪不返/打朱伮片甲不還. My question is whether đánh 打 is really Sino-Vietnamese (as would be the normal assumption) given that dǐng(?) 打 'strike' looks so marginal in Sinitic… Vietnamese word/graph?

    I'm not sure what you mean by "marginal." If you mean rare or secondary, it's true enough that 打 is rarely pronounced ding or anything like it in modern Mandarin. But the modern Shanghainese form still has a final /n/ or /ŋ/, and the /ŋ/ final seems to have been standard as recently as the late 15th century, when Shin Suk-ju gave the standard reading as /tiŋ/, with /ta/ as an alternate "left reading" (per Schuessler's ABC Etymological Dictionary of Old Chinese). If the word was borrowed into Vietnamese at an early enough date—no great feat, given the length of contact between Vietnamese and Chinese—then it could be expected to reflect an /ŋ/ final. The VNPF's online dictionary also gives the readings đả and đử, which are presumably later borrowings cognate to the aforementioned "left reading."

  11. julie lee said,

    July 17, 2013 @ 11:51 am


    You say that the words

    "隻輪不返", "片甲不還" and "使知南國英雄之有主"

    in the poem are "quite Chinese". But they are completely Chinese.

    "隻輪不返" in Chinese means "not a single wheel returns" (i.e., returns home).

    "片甲不還 " in Chinese means "not a single piece of armor returns".

    "使知南國英雄之有主" in Chinese means "to let them know that the heroes of the southern nation have their kings" (the translation in the LL posting seems to translate
    使(shi) as "history", whereas history is "史" in Chinese, and 使(shi) on the other hand means "let". Of course the clause can also mean "to let history know that the heroes…."

  12. JS said,

    July 17, 2013 @ 2:16 pm

    Thanks; did not know that Shanghainese had a nasal-final word. By "marginal" I meant distribution in the modern languages… that is, I was guessing that since the Mandarin and Cantonese words for 'hit' are, while similar, irregular with respect to the MC categories from which they are supposed to descend, a pair of unrelated synonyms might be involved here, with "打" perhaps a sú​zì 俗字 for the -ng word later adapted to the writing of both. Or not, of course.

    ^and julie lee's translation is more like it… though maybe a single "plate" or "scale" of armor?

  13. JS said,

    July 17, 2013 @ 2:20 pm

    ^@Jean-M̶i̶c̶h̶a̶e̶l̶ Michel

  14. julie lee said,

    July 17, 2013 @ 8:57 pm

    Yes, a single "plate" or "scale" of armor is better.

  15. Ngô Thanh Nhàn said,

    July 18, 2013 @ 11:22 pm

    Ken Lunde sent me a link to this discussion. Let me note the followings:
    1. People told me the poem was written by Emperor Nguyễn Huệ's wife, to be read in front of his people, as he was going to promote the use of chữ Nôm in his court.
    2. On the third verse, the third ideogram, 伮 "nó", being a third person pronoun, confirms the fact that Quang Trung said the famous lines in front of his people. So should the first two lines read, "We beat him…"?
    3. The ideogram 打 is read "đả" in Sino-Vietnamese. It looks like the Vietnamese borrowed the ideogram to represent "đánh", a native Vietnamese word with almost the same meaning.
    4. The ideogram 使 can also be read in Sino-Vietnamese, as "sứ", an "envoy, an ambassador".
    5. I notice that all third ideograms in the 5 verses of the poem were spelled with radical "nhân" in picture, i.e. 底 "để", 伮 "nó" and 使 "sử". Intentionally or oddly? and lastly
    6. There's an ideogram

  16. Nick Lamb said,

    July 19, 2013 @ 3:13 am

    iwsfutcmd, I don't think this characterisation of the difference between the BMP and SMP is either correct or helpful.

    The BMP is comparatively small and crowded, it has space for only a few tens of thousands of characters, and for good pragmatic reasons it isn't all crushed together as tightly as it could be. As a result, characters that are less used, or which didn't have champions years ago when this stuff was first being standardised find themselves in the SMP or in other planes. But neither ISO nor Unicode are able to see the future, and characters are forbidden from moving, so if a single character or an entire script outside the BMP becomes popular in the future it won't magically become part of the BMP, and likewise if (as seems likely over future generations) some of the languages whose scripts are represented in the BMP die out they won't be removed from it.

    To give a concrete example, none of the living European languages use runes, nevertheless the runes are encoded in the BMP, and some Emoji (certainly in use by ordinary people in a living language) are in the SMP.

  17. Ngô Thanh Nhàn said,

    July 19, 2013 @ 9:38 am

    Hey, every time I tried U+2A635, it got cut off… It's an ideographic representation for the sound "răng" (tooth in native Vietnamese), and is composed of radical 0211 U+9F52 "tooth" and ideogram U+590C for Sino-Vietnamese sound "lăng". The ideogram was submitted by Vietnam.

    The last ideogram on the second line of the poem is the same sound "răng", with radical 0211S U+9F7F "tooth", and ideogram U+590C "lăng".

  18. jQ said,

    July 23, 2013 @ 3:37 pm

    The character given as 底 looks more like 广代 to me, or is that an older variant of the same character?

