Really weird sinographs

« previous post | next post »

Scott Wilson has written an entertaining, and I dare say edifying, article on "W.T.F. Japan: Top 5 strangest kanji ever 【Weird Top Five】", SoraNews24 (10/6/16) — sorry I missed it when it first came out.  Wilson refers to the "Top 5 strangest kanji", but he actually treats nearly three times that many.  The reason he emphasizes "5" is so that he can stick with his theme of W.T.F., cf.:

Scott Wilson, "W.T.F. Japan: Top 5 most difficult kanji ever【Weird Top Five】", SoraNews24 (8/4/16)

Scott Wilson, "W.T.F. Japan: Top 5 kanji with the longest readings【Weird Top Five】", SoraNews24 (4/20/17)

All of the sinographs cited by Wilson are in the monumental Dai Kan-Wa Jiten 大漢和辞典 (The Great Chinese–Japanese Dictionary) by Tetsuji Morohashi, which students of Sinology and Japanology affectionately refer to simply as "Morohashi".

Wilson begins:

Kanji, go home. You're drunk.

A little while ago we took a look at the most difficult kanji ever – the ones with the most strokes. But that was only the beginning. Now that we've stepped into the world of crazy kanji, there's no going back. The only path is straight forward through the thick kanji woods, and along the way we're going to spot some really weird specimens.

That's why today we're counting down the top five strangest kanji ever. Like last time, we're concentrating mostly on kanji that are in the Morohashi Daikanwa Jiten, the official dictionary of pretty much every kanji that has ever been written down.

I've spent hours exploring the Morohashi for the most bizarre kanji ever, and now I'm here with you, a changed man with some very strange tales to tell.

I do not intend to duplicate Wilson's great achievement in his virtuoso article, which comes complete with numerous illustrations.  Instead, I will just list here some of the characters he treats, and encourage those who are interested in learning more about them (including how to pronounce them and what they mean [though the pronunciations and / or meanings of many characters that have existed in history remain unknown]) to read his wonderful essay from start to finish.

In this list, the first number is from Morohashi and the second is from Unihan.  I am deeply indebted to Ken Lunde for digging up the latter.  By clicking on the Unicode numbers, you will be able to see pictures of all these crazy characters, but they are unlikely to show up in your browser.

81 / U+20067 — does not appear in my browser

48915 / U+201AD — does not appear in my browser

49580 / U+26B99 — does not appear in my browser

229 / U+2010F — does not appear in my browser

49275 / U+24548 — does not appear in my browser

48955 / U+20AB3 — does not appear in my browser

49023 / U+219B9 — does not appear in my browser

49049 / U+21DC9 — does not appear in my browser

48977 / U+20D53 — does not appear in my browser

38 / U+200E0 — does not appear in my browser

8717 / U+22013 — does not appear in my browser

93 / U+2007C — does not appear in my browser

To tell you the truth, there are thousands more kanji / hanja / hanzi out there that are equally bizarre and whimsical and inexplicable.  Here are just a couple that I spotted in Wilson's article as I was reading through it:

49241 / U+2416D — does not appear in my browser

49242 / U+2417F — does not appear in my browser

But you can find similar specimens on almost every page of Morohashi and many more in the mega sinograph dictionaries that are even larger than it.  When we add in Sinoform characters from Tangut, Khitan, Jurchen, and Vietnamese chữ Nôm {⿰字宁}喃 (there are more than a dozen different ways to write this name), we soon come to the realization that the really weird characters out there are simply countless.

Selected readings

M. V. Sofronov, "Chinese Philology and the Scripts of Central Asia" (free pdf), Sino-Platonic Papers, 30 (October, 1991), 1-10.

ZHOU Youguang, "The Family of Chinese Character-Type Scripts: Twenty Members and Four Stages of Development" (free pdf; also available as a 650 KB PDF), Sino-Platonic Papers, 28 (September, 1991), 1-11.

Geoffrey Pullum, "The Awful Chinese Writing System", Lingua Franca (1/20/16)

"Writing Chinese characters as a form of punishment" (11/1/15)

"How many more Chinese characters are needed?" (10/25/16)

"Long kanji readings" (4/22/17)

"Han-Han Dae Sajeon" — that's 한한대사전 漢韓大辭典 (2008), which is the Korean analog of Morohashi:

Classical Chinese character dictionaries are an essential tool for accessing and understanding traditional humanities with a foundation in Chinese literature, not only in Chinese-speaking world but also in Korea, Japan and Vietnam. The first notable effort to compile a comprehensive classical Chinese character dictionary was made by Tetsuji Morohashi (1883–1982), a Japanese scholar. Tetsuji recognized the need and grew determined to compile a Chinese–Japanese Dictionary while studying abroad in China. Despite his manuscripts being burned in a fire during World War II, his publisher going bankrupt, and numerous other setbacks, after 32 years of collaborative work, the Dai Kan-Wa Jiten or "Great Chinese–Japanese Dictionary" was finally completed. Taiwan's Defense Committee followed suit with a 10-year effort, along with the Academia Sinica, to complete the Zhongwen Da Cidian, or "Encyclopedic Dictionary of the Chinese Language." In 1975, China also made the compilation of a Chinese character dictionary a national project. Collaboration attracted the participation of 43 universities, as well as numerous research centers and scholars nationwide, yielding the 12 volume Hanyu Da Cidian or "Comprehensive Dictionary of Chinese Words" in 1993.  [VHM:  It would be better here to refer to the 8 volume Hanyu dazidian or "Comprehensive Dictionary of Chinese Characters" (1989).]

Finally, while researching this post, I came upon a very interesting website called "The Nanbanjin Nikkiザ南蛮人日記" hosted by a frequent contributor to Language Log, leoboiko (Leonardo Boiko).  Since this particular issue is about a subject that is dear to my heart, I could not resist citing it here:

"List of resources on Chinese 'character etymology'" (7/5/11)

[h.t.: Nathan Hopson]



18 Comments

  1. VV said,

    May 10, 2018 @ 1:26 pm

    I wonder if the "curvy" ones, such as the 巨 variant, are only attested in seal script? The curvy forms look normal in that style; I think the bizarreness of many of these is due to weird attempts to convert them into a Mincho typeface for the Morohashi dictionary.

  2. Victor Mair said,

    May 10, 2018 @ 1:42 pm

    Cf. Ken Lunde, "'My God, it's full of stars! And turtles and dragons!'", CJK Type Blog (2/10/16)

    https://blogs.adobe.com/CCJKType/2016/02/turtles-and-dragons.html

  3. OvV said,

    May 10, 2018 @ 2:16 pm

    "…they are unlikely to show up in your browser."
    Oh?
    They all show up in my Chrome 66 browser on Windows 10.
    I even haven't installed back some CJK fonts I had when it still was Windows 8.
    Has "biáng" been Unicoded yet?
    If so, try it first on Windows 10!

  4. Michael Watts said,

    May 10, 2018 @ 2:38 pm

    100% of those characters appear in my browser, which is Firefox on Windows 10 after having made absolutely zero effort to install any relevant fonts.

    Some of these appear to be constructed in totally illegal ways, like U+26B99 which is diagonally aligned, or the many characters with rounded strokes, but a few just seem to be normal characters with a lot of components. And a few more strike me as looking perfectly normal, like U+200E0. How can that be so strange when 母 and 当 are so normal?

  5. Michael Watts said,

    May 10, 2018 @ 2:40 pm

    Intriguingly, while the astral characters display fine on unicode.org, and I don't have problems copying them and pasting them into the comment box here, they appear to have been censored from my comment. I suspect this is more a limitation of Language Log than of my browser, though.

  6. Victor Mair said,

    May 10, 2018 @ 3:02 pm

    U+200E0 — look again, that's not 母

    I'm in my office now and I can see all the characters in the browser here. I'm on Firefox and Windows 10 both at home and in my office, so it must be something else about the configuration or the computer (though it's the same computer at home — Apple desktop — but three years older). Anyway, I'm happy that I can see all these unusual characters now.

  7. Michael Watts said,

    May 10, 2018 @ 4:07 pm

    Yes, I can see that U+200E0 isn't 母. It appears to be the "backwards E with extended middle 横", which I think of as a pretty common character component (事 / 群 / etc.), combined with the two dots that 母 also features. Those two dots are in my mind less common than the E-like component, but they're not exactly rare either; they also feature in words like 一般 "normal" or 船 "boat", where they are part of the 舟 component.

    I picked 当 to illustrate the E-like component because it's the closest character I know, but obviously it isn't a perfect match. Is there a unicode point for the component itself?

  8. David Marjanović said,

    May 10, 2018 @ 4:36 pm

    U+200E0 — look again, that's not 母

    No, but it isn't any stranger than 母 or 当. The really weird ones, as VV observed, are the ones with straight diagonal strokes and the ones with loops.

  9. David Marjanović said,

    May 10, 2018 @ 4:37 pm

    Oh, I see them all in Firefox on Windows 7.

  10. John said,

    May 10, 2018 @ 6:06 pm

    I haven't been through all of them, but they can be found in the SimSun-ExtB, MingLiU_HKSCS-ExtB and PMingLiu-ExtB fonts (42,711 characters).

  11. Nick Kaldis said,

    May 10, 2018 @ 6:29 pm

    I think I may have seen at least one of these (the character comprised of 3 pie strokes) in my copy of 文字蒙求 (recommended to me by an IUP Taipei teacher many years ago). If I can find it at my office, I'll look for this and others.

  12. Victor Mair said,

    May 10, 2018 @ 6:53 pm

    "The really weird ones, as VV observed, are the ones with straight diagonal strokes and the ones with loops."

    Granted!

  13. WTN said,

    May 10, 2018 @ 7:37 pm

    Please install the font files from the Hanazono Mincho (花園明朝) family—HanaMinA & HanaMinB & HanaMinC—and all of these characters will appear in your browser.

    http://fonts.jp/hanazono/

  14. ErikF said,

    May 10, 2018 @ 10:49 pm

    I know almost nothing about kanji and hanzi, but how many strokes is a circle, squiggly line or double-loop? Were curvy characters more common in pre-modern writing?

  15. Michael Watts said,

    May 11, 2018 @ 4:11 am

    There are no strokes in a circle, squiggly line, or double loop, because those forms do not exist in (the citation forms of) characters.

    There are calligraphic cursive forms, and forms meant to be easier to write. Those can have rounded strokes, squiggles, or whatever else, but I know basically nothing about them. I write in all citation forms because I'm a foreigner who doesn't know any better.

  16. ~flow said,

    May 11, 2018 @ 11:25 am

    @Michael Watts There's one perfectly legal and very common Kanji with curves, namely 〇, the numeral zero (also written as 零). This is, admittedly, a latecomer (Georges Ifrah, "Universalgeschichte der Zahlen", Frankfurt, New York: Campus, 1986; p434) that was apparently used from the 8th c onwards.

    〇 was also promoted by famously infamous Empress Wuzetian (https://en.wikipedia.org/wiki/Chinese_characters_of_Empress_Wu) as a replacement for 星 (star), but, admittedly, this never caught on.

    Then there are 㔔 U+3514, 㪳 U+3ab3, and 㫈 U+3ac8, which were admittedly created in Korea, not China, I believe to serve as sound-only characters (namely gang, dung, and presumably yeong or similar. There are a few other sound-only Korean creations that do not stand out as much, like 乤乧乫乭乮乶乷乺乻乼乽唜唟喸嗭夞, which do look a bit special when you know what to look for; I think there's some interaction with Hangeul going on here.)

    So yes, curves and circles are quite rare in 'citation' or 'standard form' Chinese characters, but not altogether non-existing. When you count them in absolute numbers, you get only like two handful of those from a corpus of around 90,000 or so, and when you weigh in usage numbers, they're exceedingly rare, with the notable exception of 〇; you won't see any of 𠄷𠆭𠇇𠍋𠪳𡆢𡦹𡧑𢀓𦹗 in the wild very often.

    The interesting question is why specifically these forms, and almost no other 草書 Grass Script forms found their way into the dictionaries.

  17. Chas Belov said,

    May 12, 2018 @ 2:25 pm

    Also seeing them in Firefox Mac. My font browser indicates the characters exist in the following Mac fonts:

    MingLiu HKSCS Ext B
    MingLiu Ext B
    PMingLiu Ext B
    SimSun Ext B

    Interestingly, SimHei does not have an Ext B.

  18. ohwilleke said,

    May 12, 2018 @ 4:33 pm

    Do the "weird" ones come from a particular literary tradition or even a small handful of notable historical scholars? Or, are they each one offs?

RSS feed for comments on this post