Sinological suffering

Since I became a Sinologist in 1972, hardly a day has passed when I didn't spend an hour or two vainly searching for a character or expression in my vast arsenal of Chinese reference works.  The frustration of not being able to find what I'm looking for is so agonizing that I sometimes simply have to scream at the writing system for being so complicated and refractory.

It happened again this morning as I was preparing a passage that I'll be reading with the students in my graduate seminar tomorrow afternoon.  I was reading this passage from the second juan ("scroll") of the Luòyáng qiélán jì 洛陽伽藍記 (The Monasteries [Skt. saṃghârāma*]of Luoyang [547 AD]):

Biāo yì shì nánrén. Wéi yǒu zhòng dàfū Yáng Yuánshèn, jǐshì zhōng dàfū Wáng 㫬 shì zhōngyuán shìzú.


Biao was also a southerner.  Only the Grand Master of the Palace Yang Yuanshen and the Supervising Secretary Wang 㫬 were elites from the Central Plains.


*सङ्घाराम saṅghārāma or, alternatively, with anusvāra: सघं ाराम saṁghārāma, or, in Harvard-Kyoto style: saNghArAma (please ignore the spaces between syllables of the Sanskrit Romanizations; they are a result of placing the diacritical marks on top of some of the letter)


The 㫬 character looked simple enough; I was certainly familiar with both of its primary components, left and right (日+旬), but when they were put together as 㫬, I couldn't figure out how to pronounce the character or what it meant.  The rì 日 ("sun") is almost certainly the semantophore ("radical") and the xún 旬 ("ten days") is most likely the phonophore.  Judging from the way it is pronounced in other characters into which it enters, 旬 might be pronounced xún, xùn, xǔn (maybe), or xuàn (perhaps), but I couldn't be sure.

Here's how I tried to look up 㫬 in all of my dictionaries, even gigantic ones with 50,000 or so characters:

1. by sound:  xún, xùn, xǔn, xuàn

2. by radical (Kangxi #72 日) plus residual strokes (6)

3. total strokes (10)

4. 4 corner method

5. Rosenberg Graphical System (as used in the great Chinese-Russian dictionary of I. M. Oshanin)


No luck.  More than two hours of fruitless looking through heavy volumes and indices, including the massive, magnificent Ricci dictionaries.  After all that effort, I still didn't know how the character is pronounced nor what it means.  And yet here it is in such a famous work of Chinese literature as Luòyáng qiélán jì!  How could scholars have been reading this text for 14 centuries without making sure that it is part of the standard lexicographical resources for Chinese?

In many editions of the Luòyáng qiélán jì 洛陽伽藍記 (The Monasteries of Luoyang), both printed and electronic, 㫬 is simply missing, undoubtedly because it did not exist in the font available to the publisher of that particular edition.  On the other hand, sometimes 㫬 does show up in scholarship on Luòyáng qiélán jì, in which case I suspect that the printer created it on an ad hoc basis.  Another problem with 㫬 is that, when scholars do transcribe (Romanize or bopomofo) this character, they waver considerably on the readings they give for it: xu, xuan, xun, etc.

After a wasted morning trying to find the pronunciation and meaning of 㫬, under the heading "Grrrrhhh!", I wrote to some friends to vent my exasperation.  As usually happens in such cases, they wrote back saying that they had just been through a similar experience (it happens to Sinologists all the time, so the chances of a colleague experiencing the same sort of frustration looking for a character or term on any given day are high).

David Moser wrote back immediately:

You're preachin' to the choir, Victor!

Examples occur every single day, as you say.  Today's example:

A colleague in Canada asked me to get the contact info for xiangsheng ("crosstalk") performer Hou Baolin's daughter, who has written a book about her father.  He told me her name was Hou Xin 候鑫.  I Googled her, and sure enough, one glance at the characters was enough to tell me that this was indeed her name.  So I called up my xiangsheng teacher, Ding Guangquan, and asked him how to get in contact with "Hou Xin".

"You mean Hou Zhen1," he said.  "Her name is Hou Zhen."
"Oh, okay," I said.  "Maybe I got the name wrong.  Could you send me her number by text message?"
"Sure," he said.
A few minutes later I got the text message with her number, and he had written her name as 候珍. "Okay," I thought, it's that 'zhen1', pretty typical name. I wonder how my Canadian colleague and I got the name so wrong."

Then, a few minutes later, Ding texted me, writing, "Her name is actually written with a 金 on top, and two 王 characters below, but I couldn't find that character on my cell phone."

What?  I quickly checked, and indeed there is an obscure character zhen1 錱.  Instantly I saw what had happened.  My Canadian colleague had seen this printed character, probably in small size font, and naturally mistook it for 鑫**, because what else would it be?  Then when I tried to check it by Google, I was also bamboozled by the resemblance, and also assumed it was xin 鑫.

So I wrote my Canadian colleague an email telling him that we both had read the character wrong.  He wrote back saying "Damn! Another typo in my book!"  He evidently had her in the index as Hou Xin 候鑫, and only now discovers it's the wrong name!  "Oh well," he writes, "We're working on a Chinese translation of the book, and I'll correct it in that version."

So three of us, two scholars of Chinese for many decades and a native Chinese speaker — and a xiangsheng performer, to boot — had spent part of an afternoon struggling with that one pesky character that perversely resembles closely another character, and for no good reason.  What you say is exactly right; when you add up the daily time we spend struggling with character retrieval, checking, and correction, it adds up to a huge waste of time and mental energy.  And most often all for absolutely zero increase in meaning, value, human worth, or intellectual progress.

Grrrrhhh, indeed.


**VHM:  xīn 鑫.  Usually this is glossed as being used in names without any particular meaning associated with it.  If pushed to give a definition, people might say it means "prosperity, wealth", since it consists of three "gold" characters.

Recently, in Korea, a new use for 鑫 has arisen as the nickname for Kim Jong-un; the character is composed of three Kim characters ( ["gold; metal"], and Kim Jong-un is the third Kim to rule North Korea.  In Korean, 鑫 would be read as heum / hŭm.  Source.

Bob Ramsey followed a few minutes later with this sympathetic note:

Well, I sure know what you mean, Victor. Still, I have to admit that some of the effort associated with Chinese characters I find to be kind of fun. But maybe that's because, like many others, I have a streak of masochism. After all, as I think you'll admit, there's a certain amount of masochism to being a Sinologist–or an East-Asianist.

And then there's the sadistic side of the enterprise. Way back around 1970 when I was just starting grad school and taking a basic Classical Chinese course, Hugh Stimson gave me a text written by a modern Confucian scholar from Hong Kong and told me to translate it by the next week. I was just a beginner without much ability to use context to read, and so I struggled trying to decipher each character, and at some point I hit on a character that I couldn't find in any dictionary in the library. It was surpassingly simple with only four strokes, and yet I couldn't find it in any source. I sweat for a week, not willing to give up. Finally the day arrived, and I went in to Hugh's class and admitted to him that I had failed. He broke out in a hearty laugh. He had pulled one over on me. It turned out that the simple character was a form of 丘, but because the author was a devout Confucian he had omitted one stroke, the front leg of the character, to show respect for the given name of Confucius! It was an effective pedagogical trick; I have never forgotten that lesson. But it was also a bit like the hazing of a pledge into the Sinological fraternity.

On a more serious point. Over the years I have continued to be impressed with Jerry Norman's take on Chinese characters that he laid out in his book "Chinese". The idea that the lack of flexibility, the inadaptability, of that logographic writing system had a profound effect on all that came under its sway still lingers in my head. As Jerry says, over their long history of the past 2,000 years, the Han Chinese have, for all intents and purposes, only had two different written languages, "Classical Chinese" and baihua [VHM:  vernacular / koine]. Different languages for the most part just didn't get written down, thus creating the illusion that the myriad varieties of the Han languages were actually only one language, and thus also reinforcing an illusion of cultural and linguistic unity. The idea is a bit simplistic, I know, but it seems to me to be much more important than most people, even a lot of specialists, realize.

After supplying the detailed information about 㫬 given in the Afterword below, Zach Hershey sent me the following note:

I came across my own problematic character tonight. It has a Unicode entry, but it only appears if you have certain fonts downloaded. It's found in the name of a monastery and is likely the name of a mountain. The character is 山+共.

Here are a bunch of texts from ctext (for "ctext" see below) that have the character:

This page gives a phonetic gloss of "goengq," but I'm not really sure what to make of that. I need to come up with a Japanese gloss for it. I think that I might have to take a look in Morohashi tomorrow.

I replied to Zach:

I don't think you'll find 山+共 in Morohashi.  I just checked.  Furthermore, this site that you sent to me correctly identifies it as an old Zhuang character meaning "steep slope" and pronounced "goengq".  On Zhuang language and characters, see:

"Topolectal traffic sign" (3/6/17) (esp. in the comments, e.g., me to @Adrian)

"The languages on Chinese banknotes" (9/16/13)

Wikipedia on Sawndip

So 山+共 is probably not a hanzi, but rather an old Zhuang character — even though it looks like a hanzi.  Or perhaps it is an obscure variant of some other Chinese character, or a Zhuang character that has been borrowed for local purposes.

Added note:  山+共 has a Unicode number (U+21DB5 [not supported by WordPress]) and an entry in zdic (large online Chinese dictionary), but the latter doesn't tell us the pronunciation or meaning of the character.

I won't go into all the gory details that searching for difficult characters entails.  Suffice it to say that, after a few hours on the chase, the Sinologist's study is apt to look like a battlefield with a lot of dead soldiers strewn all over the place:

Victor H. Mair, " The Need for an Alphabetically Arranged General Usage Dictionary of Mandarin Chinese: A Review Article of Some Recent Dictionaries and Current Lexicographical Projects" (pdf), Sino-Platonic Papers, 1 (February, 1986), 1-31.

See also:

"The economics of Chinese character usage "(9/2/11).

And then there's this classic take on the hanzi / kanji / hanja:

Geoffrey Pullum, "The Awful Chinese Writing System" (The Chronicle of Higher Education, Lingua Franca:  Language and writing in academe, 1/20/16)

Bottom line:  this is what Sinologists do every working day of their life.  It's not a very glamorous occupation, but somebody has to do it.  And, as was pointed out above, some Sinologists revel in the misery.

[Originally drafted 3/28/17]


Afterword for specialists

The only way I could find 㫬 was — after going through all of the lexicographical gymnastics described near the beginning of this post — to ask one of my graduate students, Zach Hershey, to handwrite it on his pad.  Since 㫬 does have a Unicode number (U+3AEC), he was able to locate it strictly through its raw shape that way.

After obtaining the Unicode number for 㫬 in this manner, I could copy and paste it into my browser and hopefully dig up some information about it from the web.

The first place I looked was Hànyǔ dà cídiǎn 汉语大词典 (Unabridged Dictionary of Sinitic), which is the closest thing to the OED for Chinese.  Although this dictionary of twelve large volumes includes over 23,000 characters, 㫬 is not among them, despite the fact that it did appear in the highly regarded Luòyáng qiélán jì 洛陽伽藍記 (The Monasteries of Luoyang [547 AD]).

At this point in my search, although 㫬 exists to the degree that somebody has seen fit to assign it a Unicode number, I still don't know how to pronounce it or what it means.

In this gigantic online dictionary, 㫬 is said to be equivalent to 昫 and 煦.  So I still don't directly know how to pronounce 㫬, but I'm told here that it means "warm".  However, if I can trust this online dictionary, 㫬 would mean and sound the same as xù 昫 ("warm") and xù 煦 ("warm").

Here is a list of occurrences of 㫬 that can be found in Chinese Text Project (commonly referred to as "ctext" for short).  Even though Luòyáng qiélán jì 洛陽伽藍記 (The Monasteries of Luoyang) is included in ctext, the edition it uses has a blank space where 眴 should occur.

Looking around at other editions of our text that are available online, many of them substitute 眴 for 㫬, but this leads us off in quite a different direction, since 眴 — with an eye radical (instead of a sun radical) — means "dazzled" and can be pronounced xuàn, shùn, and xún in Mandarin.

The entry for 㫬 in the Taiwanese Ministry of Education's Yìtǐzì zìdiǎn 異體字字典 (Dictionary of variant forms of characters) provides many scans from such sources as Shuōwén jiězì 說文解字 (Explaining Graphs and Analyzing Characters [100 AD]), the seminal lexicographical work on Chinese character construction, but there are relatively few examples of 㫬 itself, with most of the examples taking the form of 昫 or 煦 (which we've already discussed above).

If you follow this link, it will take you to the entry for 㫬 on their site, and on the side you can scroll through the various examples that they provide from scanned texts. Most of the glosses give "rì chū wēn yě 日出溫也" ("warmth from the sun coming out").

Zach did some database work and found other texts beside ours that include 㫬, but there are very few hits.  Two of them are from the Taoist canon, and there are several others from scattered sources.  One of the citations gives a phonetic gloss of shùn 舜 (meaning irrelevant).  None of the citations from databases that include 㫬 are helpful in understanding its usage in our text.

[Thanks to Deven Patel, Luther Obrock, and Dan Boucher]


  JK said,

    March 31, 2017 @ 1:41 pm

    Isn't this a case where technology is of great benefit to us? I found it quickly through a radical search on zdic under the Kangxi Zidian section:
    Maybe I'm lazy, but my general rule of thumb for hard-to-find characters is 1) Look it up in the Xinhua Zidian using the four corners system (thanks to Prof Baxter for teaching me this method) 2) Check 3) Give up

  flow said,

    March 31, 2017 @ 3:01 pm

    @JK: not wanting to disappoint you, but you do realize that the material quoted at the link you gave discusses 昫, not 㫬, do you? As far as I can tell the Kangxi does not list 㫬 and does not claim that 㫬 and 昫 are related in any way; that claim comes from the editors of the zdic website. They may be right, but they do not adduce any further material, so the claim is a weak one.

  3. Victor Mair said,

    March 31, 2017 @ 3:11 pm


    Precisely! Technology cannot solve problems that are built into the system.

  Wolfgang Behr said,

    March 31, 2017 @ 3:56 pm

    Easiest way to find if a character exists in Unicode: go to the "Unihan Radical-Stroke Index",

    search by radical (日=sun) and remaining strokes (旬=6), arrive at

    Copy and paste the character (or its Unicode identifier U+3AEC) into a search engine, and browse through the 8.280 (Google), Baidu (527), Yandex (817) hits to get more information about it. The fact that the Unihan page doesn't give you a pronunciation makes it immediatly clear that this is a very rare character. For such information, the "International Encoded Han Character and Variants DB at Academia SInica is usually pretty good and it is the first link which comes up on google:

    It gives you the two pronunciations xu4 (i.e. 香句切 <– Guangyun) and xu3 (i.e. 火羽切 <– Jiyun).

    The online Xinhua zidian

    lists 㫬 as a variant of 昫, like the link from which you already provided. This equation ultimately comes from the Longkan shoujing 龍龕手鏡 (Handy Mirror in the Dragon Prayer Niche), the famous dictionary of 26k "vulgar characters" (suzi 俗字) compiled by the Khitan monk Xingjun and finished in 997:

    The "regular" character (昫) is well-attested since, e.g., the Huainan zi 淮南子 (2d c. bce ?), as is immediatly clear from the entry it has in the Kangxi zidian, also quoted at the same site, and, indeed, in the Shuowen. It is a suitable character for a personal name, since it means "warm", "warm-blooded", "sunny" among many other things, according to the Kangxi zidian. In fact the combination 王昫 is so common, that googling it becomes uninteresting, since there are too many historical and contemporary people by that name.

    The passage from the Luoyang qielan ji which started your search makes it likely that the character must be the first name of an official. One would therefore naturally proceed by looking for Wang Xu3/4 王昫 in large text databases. This turns out to be a dead end, however, because even good databases such as ctext and only point to citations from various editions of the Luoyang qielan ji for this combination, i.e.

    — in three Qing and Republican period ediions


    — in the usually very reliable Hanfenlou edition. This also the variant which occurs in what is probably the best online edition of the text in the CBETA TripiTaka, but none of the commentators bother to comment on the character.

    (Aside: an alternative way to search an obscure character is to plug the textual environment into one of the databases, which would have immediatly lead you to this link, of course.)

    3 minutes to the pronunciation, but 42 minutes until here. It would require a library well-stocked in Tang historiography to take it any further from here, I'm afraid.

  Ben Zimmer said,

    March 31, 2017 @ 4:32 pm

    The automated Language Log feed on Twitter shows a summary of each post, cutting off after a certain number of characters. That sometimes leads to unexpected truncations — some of our followers are chuckling over this one.

    (h/t Kendall Willets, patterlinguist)

  flow said,

    March 31, 2017 @ 4:55 pm


    I couldn't make out any data on 㫬 in printed or online sources. I didn't specifically check those portions of Kangxi dedicated to characters without known meanings and / or readings, though. That said, I'd like to add that with texts that old, documents written in seemingly / factually simpler systems do have their difficulties, too. When a word is misspelled or has fallen out of use altogether, it can be difficult to ascertain the intended meaning. And maybe in this case it never had any meaning and reading (outside of a small circle of people) to start with. I can call myself 曌 any time and tell people it's a fancy way to write 照 zhao4 bright. In case my existence under that moniker should leave some written trace, but I don't happen to be a well-documented ruler—what will people of later times make out of this? They will have no clue, and to be candid, very few of my contemporaries ever knew, or cared. As for society at large, this was never anything but a dead parrot, an assembly of script elements. A reading and a meaning is assigned by convention; very few convene on 曌, and even far less on 㫬, it would appear.

  JK said,

    March 31, 2017 @ 4:56 pm

    Yes, I was aware that I was placing my trust in the zdic editors to correctly connect those 异体字, my point still stands that technology offers us so much more now right at our fingertips without having to trudge down to the graduate library, just as Prof. Mair's point still stands that certain parts of Sinology are still infuriating.

    As a somewhat related anecdote, apparently my wife was given the name qing2 䝼 ( which is an obscure character found in the Kangxi Zidian, and when it came time to enter her name into government registries they had to change it to 晴, because the original character could not be typed. Who knows, with unicode today perhaps they now allow that name to be used.

  Rubrick said,

    March 31, 2017 @ 5:09 pm

    @Ben Zimmer: That made my day.

  David Spindler said,

    March 31, 2017 @ 5:30 pm

    Victor-I highly recommend the 《中华字海》. It lists 日+旬 as being the same as 煦, pronounced xu4

  Eidolon said,

    March 31, 2017 @ 6:05 pm

    "Different languages for the most part just didn't get written down … reinforcing an illusion of cultural and linguistic unity."

    I think the word here should be 'homogeneity'; not 'unity'. The reason being – Classical Chinese didn't merely create an *illusion* of cultural and linguistic unity … It contributed to the *actual* cultural and linguistic unity of elite Chinese, who considered literacy in that language as the basic requirement for group membership, transmitted and shared common rituals and beliefs through that language, and believed that these factors were ultimately more important for defining who they were than their mother tongues and local customs.

    To the scholar-officials who dominated the bureaucracy of China all the way up until the contemporary period, to be Chinese was to be literate in Classical Chinese, to have read the Classics, and to practice the rituals and beliefs contained in the Classics. We could compare this situation with that of Latin's in Europe, where a common concept of Western civilization rose through Latin literacy and the transmission of Greco-Roman and Christian rituals and beliefs through Latin. Indeed, the only difference might be that Europe was politically divided by the time mass literacy and nationalism came around, or else we could have seen the rise of a Latin-speaking nation-state of Europe, instead of a loosely bound European Union.

  AntC said,

    March 31, 2017 @ 6:26 pm

    Would the contemporary readers of Luòyáng Qiélánjì have understood this character? What was the intended readership in 547 AD? Would they have had access to references if they didn't already know the character? Or how would they have learnt the character?

    Was the whole thing intended to be merely mystical/suggestive/inscrutable?

  Chau said,

    March 31, 2017 @ 7:55 pm

    The text in Wikisource:

    uses 旬 to substitute for 㫬:

  Robert Ramsey said,

    March 31, 2017 @ 8:27 pm

    @Eidolon: Well, perhaps I'm being presumptuous to interpret Jerry Norman's point, but I think that very comparison to Latin you raise was at the heart of what he meant. In traditional East Asia, the role of Classical Chinese and Chinese characters was certainly much like that of Latin and Roman letters in Europe. But of course, in one very important respect writing in traditional East Asia was not like writing in Europe. That's because in Medieval Europe, a large number of written languages were based upon local vernaculars, while in China, written languages based upon a similarly broad variety of other languages and dialects never did develop. What Jerry said, I believe, was that (with a little exaggeration, but not really very much!) there have more or less only been TWO forms of written Chinese: Classical (文語) and the colloquial style known as 白話. In contrast, all over Europe local written vernaculars sprang up, in the Middle Ages certainly, but actually almost from the very beginning of each successive people's contact with writing. In Anglo-Saxon England, for example, FOUR principal dialects were written down: Kentish, West Saxon, Mercian, and Northumbrian. (Modern English is descended from the Mercian dialect.) By comparison, writing in China, and even in East Asia as a whole, was virtually monolithic. What happened to local literature in China?
    When I was at Columbia, C.T. Hsia told me an interesting story. Back in the 1930s, he said, Hu Shih had discovered a colloquial novel originally written in Shanghai in 1892 by Han Bangqing (韓邦慶) called The Flowers of Shanghai (海上花列傳). Upon discovery of the text, Hu had immediately become excited. That was because he had long been a literary fan of the writings of Thomas Hardy and the extensive dialogs in local English dialects represented in those writings, and he had despaired over the fact that in the vast literary traditions of China nothing similar could be found. But then here it was: A novel with long conversational passages representing the colloquial dialect used in the Shanghai demimonde! In this state of high agitation Hu Shih rushed to reissue Flowers of Shanghai with a new preface he had written praising what the novel now represented for local expression in the new China. But no. In spite of Hu Shih's campaign and all the fanfare over the novel's reissue, it fell flat. Almost no one bought or read it. And today the novel is so obscure, only specialists have ever heard of it, and even Shanghai natives cannot read it. The situation is completely different from that of the beloved novels of Thomas Hardy. –Or, in fact, of the status of many other local dialect writings in England, Europe, and even America. (Think Mark Twain.) Why? Why did China never develop vehicles for local expression the way the many various peoples of Europe seem to have done so easily?
    The reason, I believe, is the difference in the nature of the writing systems that were used. The Roman alphabet (or further east, the Greek alphabet) was surpassingly easy to adapt to a new language or dialect each time it passed into new hands. All you had to do was equate the alphabetic letters to phonological units in your own language and you were well on your way to sending along messages and stories to other speakers of your own language. Chinese characters, on the other hand, did not have that suppleness. Just look at the struggles Koreans, Vietnamese, Japanese–and yes, the Zhuang–went through to represent their own languages! Sure, as you say, Koreans could continue to write in Classical Chinese, and they did do so up until almost the 20th century. But they, as well as other East Asian peoples, also went to great lengths to represent the sounds of their own language. Why would they not? If not for formal writing, at least for such things as poems and songs and prayers. After all, one's native language is always the medium closest to one's heart, isn't it? But Koreans did not have first-hand access to the easy and flexible tool that an alphabet offers until the invention of Hangul. For their part, the Japanese also made a breakthrough, but only after they began to use the characters to transcribe sounds independent of the characters' associations with specific morphemes.

    In sum–and I believe this is what Jerry Norman was talking about–it was the very nature of the Chinese writing system that suppressed forms of local expression, while, on the flip side of that shortcoming, it also provided a kind of illusory linguistic unity that endured in East Asia for a very long time, and in China still does today.

  14. AntC said,

    March 31, 2017 @ 11:06 pm

    Thank you @Robert Ramsey for that exposition. What I don't get is why anybody in the sino-logo-sphere ever formed the illusion of linguistic unity (or @Eidolon homogeneity).

    With the Latin script, that it could be adapted to record many languages did not give rise to an illusion they were 'the same' or variants of something the same. (Although in Western Europe they were related back to PIE, that's no more nor less the same than Sinitic topolects.)

    Presumably literacy was restricted to a scribal class in all cultures until very recently.

    I can see that with a syllable-tone-based writing system, adapting "to represent the sounds of their own language" would be more awkward than a syllabary/abjad, which would be more difficult than an alphabet. Japanese got there eventually. Why did it take so long to realise (and why did no other cultures realise) you could "transcribe sounds independent of the characters' associations"?

  JB said,

    April 1, 2017 @ 12:41 am

    In given names, 鑫 is often used to supplement a deficiency of the metal element, according to divinatory readings of personal names.

  liuyao said,

    April 1, 2017 @ 8:35 am

    Being a work from 540 AD, why would one want to find what the 20th century dictionaries say about its Modern Mandarin pronunciation? I would assume it is something like xun2 and continue reading. After all it's clearly a person's name. Incidentally there was a person named 王珣, from the Langye Wang clan (hence a cousin of Wang Xizhi), best known for a piece of calligraphy that the Qianlong Emperor prized as one of the san xi (three rarities).

    Many of the dictionary pronunciations are questionable. A recent example that came up (in social media) is 尷尬, which the dictionary (and in everyday speech) uses a Wu sounding pronunciation (gan1 ga4). It was reported that Taiwan just gave an alternative pronunciation (jian1 jie4) that would have been more natural in Mandarin.

  David Moser said,

    April 1, 2017 @ 8:41 am

    Not to promote my own stuff, but see this article, which is sort of a "lite" version of Prof. Ramsey's excellent comment.

  18. Victor Mair said,

    April 1, 2017 @ 9:22 am

    "I would assume it is something like xun2 and continue reading."

    That's because you're not a Sinologist.

  Robert Davis said,

    April 1, 2017 @ 9:39 am

    Wow! And I thought it was a big deal when Real Academia Espanola ( no tilde on IPad) decided that Ch and Ll were no longer separate letters for the dictionary.

  cliff arroyo said,

    April 1, 2017 @ 10:08 am

    "decided that Ch and Ll were no longer separate letters for the dictionary"

    I don't think I will ever forgive them for that….

  liuyao said,

    April 1, 2017 @ 11:07 am

    I went to ctext and found a digital copy (of the sibu yecong edition), and in there it was clearly written/typeset as 㫬. In case people haven't heard, ctext has had a recent upgrade that significantly improved its OCR technology. I corrected a few mistakes on that page, including the 㫬 character (thanks to this post).

  Chris Godwin said,

    April 1, 2017 @ 11:12 am

    Re comments in the thread about Latin and Chinese. I recently came across a good discussion by Richard Kunst, titled "Literary Chinese viewed in the light of literary Latin". Not clear if it's been formally published, and I failed to note the URL. But the website is something like

  Chris Kern said,

    April 1, 2017 @ 1:29 pm

    On the Japanese side, what bothers me is looking up some kanji sequence and finding out it's just an alternate (and more obscure) way of representing a common word, or at least a word I already know. Just this week I was reading a chapter of a book that had 精しい, 障碍, and 稔り — all common words but I had never seen them written in those ways before. This is a book that was initially written in the 1940s and then updated in the 60s and finally in 1980, so I'm not surprised it has some old usages, but I still feel like it's a waste of time.

  KWillets said,

    April 1, 2017 @ 2:14 pm

    @Ben Zimmer It seems like a simple lookup table would have prevented this, but perhaps they failed to look for Britishisms. Splitting the last word doesn't seem like a good idea in the vast majority of cases anyways.

  25. Victor Mair said,

    April 1, 2017 @ 2:17 pm

    Incidentally, most people will instinctively read 伽藍 (in the title of the text we're concentrating on in this post) as something like jiālán or gālán. It's only the Sinologists who, knowing that it is a transcription of the middle parts of the Sanskrit word saṃghārāma, maintain that it should be read qiélán.

    The full, unabbreviated form is sēngqiélánmó 僧加藍摩, which is a transcription of Sanskrit saṃghārāma or saṅghārāma; Devanāgarī संघाराम or सङ्घाराम.

    Similarly, all Mandarin speakers that I knew more than thirty years ago, pronounced 小乘 (so-called "Hinayana" — "lesser verhicle") as xiǎochéng and 大乘 ("Mahayana" — "greater vehicle") as dàchéng. It was only when I pointed out that the -乘 part of these Sanskrit words is a noun ("vehicle"), not a verb ("mount"), and that it should therefore be rendered as a noun in Chinese (-shèng), not a verb (-chéng). Since I first published and lectured on this more than three decades ago, careful Sinologists and Buddhologists have caught on to the distinction. I suspect, however, that the majority of Mandarin speakers continue to misread these terms as xiǎochéng and dàchéng — and move on. But, for a Sinologist, such things make a big difference.

  flow said,

    April 1, 2017 @ 2:21 pm

    @Chris Godwin "Literary Chinese Viewed in the Light of Literary Latin" by Richard A. Kunst

  JK said,

    April 1, 2017 @ 4:22 pm

    That reminds me of how I was taught the poetry of Li Bo in college, but when I went to China nobody had any idea who I was talking about.

    As a somewhat related anecdote, apparently my wife was originally given the name 䝼 qing2, but when it came time to get her a hukou/ID etc, her parents were told they couldn't use that name due to the limitations of technology at the time, so they changed it to the much more common 晴 (different meaning, same pronunciation). Now that her original name is in unicode, I wonder whether that name and others can be used by other parents who have consulted the Kangxi Zidian and other references for obscure characters to name their kids.

  flow said,

    April 1, 2017 @ 5:10 pm

    For what it's worth, I for one am not convinced that we should read 大乘 as dasheng instead of as dacheng. Incidentally, the ABC Dictionary of Chinese and English only has the latter, not the former; has dà chéng in the entry title, and dà shèng where it quotes the 國語辭典. I really have no very deep attachment to the one or the other pronunciation other than that I like to be understood. Knowing about dasheng will help me understand others.

    I'm also not convinced that a sinologist should feel compelled to read 僧(加/伽)藍摩 as sēngqiélánmó instead, say, sēnggālánmó, or, indeed, sēnggālánmā, especially considering its origin is Skrt. saṅghārāma. The logic may seem obvious to a sinologist, but the rest of the audience will certainly marvel at the logic of the statement: that "because 伽藍 comes from Sanskrit saṃghārāma, it should be read qiélán, not jiālán or gālán". The strongest reason to stick to qiélán instead of jiālán or gālán would certainly be a relevant community where common usage says so. This remark comes from someone who tried for some time to bring some faithfulness in pronunciation (of Sino-Korean) to a faithful community (of speakers of European languages); I cannot claim to have achieved anything in this regard. My insistence has been not tastier to others than a worn-out chewing-gum clinging to my soles; at home, it is still with me, I'm afraid.

    I had to think of an old question of mine—what with cybernetics? why is it spelled and read this way? In German, we happen to pronounce that (new word introduced by Wiener) with [kyb…], not [tsyb…]; in English it *could* conceivably have become [kib…] or [kaib…], as the case may be; it ended up as [saib…], however. The American Heritage Dictionary kindly supplied me with these guides to some difficult words in English:

    Kyrie (kîr'ē-ā′)

    kylix (kī'lĭks, kĭl'ĭks)

    Cythera (sĭ-thîr'ə, sĭth'ər-ə), also Kythira (kē'thē-rä′)

    cybernetics (sī′bər-nĕt'ĭks)

    This little evidence is certainly not conclusive, yet it does suggest e.g. that kylix could alternatively appear as cylix [sailiks] in any given text, and could not be proven *wrong* (although it's not in the AH). It also suggests that we as English users are fundamentally at loss how to spell or say Κύθηρα [ˈciθiɾa]; is it Kythira / Cythera / Kythera / Kithira? is it [kɪˈθiːrə], [ˈkɪθᵻrə], [ˈkiːθiːra], [sɪθərə]…?

    Considering the paramount role of Greek for Western culture, and the fact that we're using a sound-writing script inherited from this very language and culture, this is truly astounding.

    On the other hand, I'm *still* curious to hear about the reason we should read 伽藍 as qiélán rather than, say, gālán.

  Jonathan Smith said,

    April 1, 2017 @ 6:44 pm

    I suppose the idea is that the first phonetic borrowings of the Indic words used a syllable like ga, which was written with the ad hoc transcriptional character 伽 (why ad hoc itself an interesting question), and that Mandarin qie2 and not some other is the direct reflex of this (MC) ga and thus the most "correct" reading of the character in this context. You are right though that this is silly to the degree that it privileges a kind of exclusionary erudition in the script over spoken communication. Liuyao is right that it can be a waste of time to obsess about the "proper" Mandarin pronunciation of early words/characters: it's a useful scholarly exercise only to the extent that it clarifies linguistic matters. So insisting that it is "correct" to read zhi1 zhe3 yao4 shui3 知者樂水 and not le4 shui3 is pedantic, but depending on your interests, the difference between the earlier antecedents of le4 vs. (artificial) yao4 could be critical.

  Robert Ramsey said,

    April 1, 2017 @ 9:28 pm

    @David Moser: What a fun little essay you wrote about restrictions on wordplay using Chinese characters! Thanks for sharing. I'm definitely going to share!

  AntC said,

    April 1, 2017 @ 9:36 pm

    @flow you're looking for parallels where there are none. (Or if there used to be, they were long-ago erased by sound-change.)

    Considering the paramount role of Greek for Western culture, and the fact that we're using a sound-writing script inherited from this very language and culture, this is truly astounding.

    Not astounding. Compared to the difficulties with Chinese script discussed here, it registers barely a blip.

    Many Greek words came to English via Latin, so there was a double-approximation of pronunciation, and possibly some transliteration rather than transcription. I'm not sure about all the words you noted, but "Kyrie" I'd say has never really got to English. It appears only in the Latin mass — about which hardly anybody but music buffs would know these days — so only in the fixed phrase "Kyrie Eleison".

    You could contrast the pronunciation of the girls name "Kylie". (One letter different, but a whole world away.)

    w.r.t. the "sound-writing script": the Latin script is a very poor fit to English sound patterns. English spelling is awful, and a drag on every learner. I think LLog would be complaining long and loud (as have so many down the ages), were it not that Chinese script is orders of magnitude worse.

    Go and read up the history of how English spelling go to be such a mess. (Arguably, Anglo-Saxon used to have an alphabet better adapted; but it got swept away by a bunch of invaders who brought a language and sound-writing system from a whole nother branch of IE.) It's no surprise that a foreign learner would be perplexed (especially coming from German's regular spelling); do not expect regularity.

  Fluxor said,

    April 1, 2017 @ 9:58 pm

    May I recommend wikitionary for lookup of a character if all you're looking for is the Unicode. You can search by radical.

  33. Victor Mair said,

    April 1, 2017 @ 10:08 pm


    Thanks. It's helpful to that extent, but no sound and no meaning.

  34. Victor Mair said,

    April 1, 2017 @ 10:29 pm

    Cybele (/ˈsɪbᵻliː/; Phrygian: Matar Kubileya/Kubeleya "Kubeleyan Mother", perhaps "Mountain Mother"; Lydian Kuvava; Greek: Κυβέλη Kybele, Κυβήβη Kybebe, Κύβελις Kybelis), the Anatolian mother goddess (from Wikipedia) ≠ Σιβυλλα (Sibylla), "prophetess, sibyl"

    Cf. Cyrus, which is, according to Wikipedia:


    …the Latinized form of the Greek Κῦρος, Kȳros, from Old Persian Kūruš. According to the inscriptions the name is reflected in Elamite Kuraš, Babylonian Ku(r)-raš/-ra-áš and Imperial Aramaic kwrš. The modern Persian form of the name is Koorosh.

    The etymology of Cyrus has been and continues to be a topic of discussion amongst historians, linguists, and scholars of Iranology. The Old Persian name "kuruš" has been interpreted in various forms from "the sun", "like sun", "young", "hero" to "humiliator of the enemy in verbal contest" and the Elamite "kuraš" has been translated as one "who bestows care".

    The name has appeared on many monuments and inscriptions in Old Persian. There is also the record of a small inscription in Morghab (southwestern Iran) on which there is the sentence (adam kūruš xšāyaƟiya haxāmanišiya) in Old Persian meaning (I am Cyrus the Achaemenian King). After a questionable proposal by the German linguist F. H. Weissbach that Darius the Great was the first to inscribe in Persian, it had previously been concluded by some scholars that the inscription in Morghab refers to Cyrus the Younger. This proposal was the result of a false interpretation of a passage in paragraph 70 of Behistun inscription by Darius the Great. Based on many arguments, the accepted theory among modern scholars is that the inscription does belong to Cyrus the Great.

    There are interpretations of name of Cyrus by classical authors identifying with or referring to the Persian word for “sun”. The Historian Plutarch (46 – 120) states that "the sun, which, in the Persian language, is called Cyrus". Also the Physician Ctesias who served in the court of the Persian king Artaxerxes II of Persia writes in his book Persica as summarized by Photios that the name Cyrus is from Persian word "Khur" (the sun). These are, however, not accepted by modern scholars.

    Regarding the etymology of Old Persian kuruš, linguists have proposed various etymologies based on Iranian languages as well as non-Indo-European ones. According to Tavernier, the name kuraš, attested in Elamite texts, is likely "the original form" as there is no Elamite or Babylonian spelling ku-ru-uš in the transcriptions of Old Persian ku-u-r(u)-u-š. That is, according to Tavernier, kuraš is an Elamite name and means "to bestow care". Others, such as Schmitt, Hoffmann maintain that the Persian Kuruš, which according to Skalmowsky, may be connected to (or a borrowing from) the IE Kúru- from Old Indic can give an etymology of the Elamite kuraš. In this regard the Old Persian kuruš is considered with the following etymologies: One proposal is discussed by the linguist Janos Harmatta that refers to the common Iranian root "kur-" (be born) of many words in Old, middle, and new Iranian languages (e.g. Kurdish). Accordingly, the name Kūruš means "young, youth…". Other Iranian etymologies have been proposed. The Indian proposal of Skalmowsky goes down to "to do, accomplish". Another theory is the suggestion of Karl Hoffmann that kuruš goes down to a -ru derivation from the IE root *(s)kau meaning "to humiliate" and accordingly "kuruš" (hence "Cyrus") means "humiliator (of the enemy in verbal contest)".


  sima said,

    April 2, 2017 @ 1:19 am

    Hi while not being a Buddhologist, I am interested in this saṃghârāma and its Chinese transliteration. From what I have read above, 伽藍 seems to be the phonetic correspondence of the 'ghârām' part, right?
    So why does it make more sense to want to pronounce 伽 as qié rather than gā (pinyin)?
    Or is there another ad hoc character usually used to deal with Sanskrit gā that could cause confusion? Maybe my question is stated in a strange manner, but the schema I gather from this is:
    Sanskrit gha = Chinese 伽 (pron. qié in Mandarin < MC = ??)
    Sanskrit ga = Chinese ?? (pron. ?? in Mandarin < MC = ??)

    Could you help me fill in the question marks? Thank you!

  John Carr said,

    April 2, 2017 @ 9:27 am

    Victor Mair wrote: "山+共 has a Unicode number (U+21DB5 [not supported by WordPress])"

    When you see a five digit Unicode number you know you're in trouble. Unicode was designed so that all the common characters used in all the modern written languages of the world would fit into 16 bits, representable in four hexadecimal digits.

    The characters that take five digits are very rare, ancient, or didn't exist when Unicode was designed about 30 years ago.

    Character U+21DB5 belongs to "CJK extension B". (See for PDF files with all the characters numbered and illustrated.) I showed some pages from CJK extension B to a well-educated native Mandarin speaker. The vast majority of characters were unfamiliar. On the page containing U+21DB5 (60 characters with a 山 radical) she had seen about a dozen, but without context only knew the meaning of two and the pronunciation of one.

    The Unicode in the post caught my eye because I work on a software product with a web interface. One of my minor tasks is testing support for characters that don't fit into 16 bits. Some popular programming languages including JavaScript limit characters to 16 bits. Larger characters have to be broken into two parts for processing and reassembled later. When that process doesn't work you get question marks or missing character boxes in place of your original text. (You also get that if you don't have the right font installed on your computer.)

    We have a lot of customers in Asia, but I have never seen a customer report of a bug in this process. Nobody is naming documents after a thousand year old variant of the word for turtle. For everyday language the original assumption of Unicode is correct. The written working vocabulary of the world fits in 10,000 characters or so.

    With one interesting exception. Emoji did not exist when Unicode was created. Some emoji live in this second-class-citizen district of Unicode. If a web site does support archaic Chinese characters, you might have the chipmunk U+1F43F to thank.

  Rodger C said,

    April 2, 2017 @ 11:44 am

    @AntC: I've been told, at least, that once upon a time, Anglicans discussing liturgy would refer to "the Kyrie" and rhyme it with "fiery."

  Rodger C said,

    April 2, 2017 @ 11:44 am

    You know, like the Venighty.

  Bathrobe said,

    April 2, 2017 @ 8:59 pm

    There appears to have been a great deal of regularisation of character readings on the Mainland. Interestingly enough, I remember a friend of mine once using a word containing 乘 (regrettably I forget the word, but it had a meaning something like 'magic vehicle'), in which she pronounced 乘 as shèng. I looked it up in the 现代汉语词典 and found it listed with the pronunciation chéng. She responded to this with a shrug and a look of helpless resignation. After all, who can argue with a dictionary?

    But it seems to me that, just as with 李白, 往, 法国, and many other terms, various busybodies have been going round 'fixing up' the language. This is not a speech-based process (working off how people — at least Beijing people — actually speak) but a character-based process (forcing a change in pronunciation in order to simplify the relationship between speech and writing).

    Interestingly, I've noticed what might be called 'spelling-based' pronunciations in the south of China, where people have gone one step further and regularised where the busybodies haven't yet interfered yet. One is the reading of 假期 jiàqī 'holidays, vacation' as jiǎqī. Another is the reading of the surname 任 rén as rèn.

  WSM said,

    April 3, 2017 @ 7:42 am

    Yeah OSX iPad handwriting recognition is very useful in these cases. In tricky situations like this the Kanseki Repository is very useful (dunno if that's the corpus your graduate student consulted) for gathering some basic evidence.

  liuyao said,

    April 3, 2017 @ 9:55 am

    I too only have heard 乘 in 大乘 and 小乘 pronounced as cheng2. The sheng4 reading exists in phrases like 千乘之國 (state of a thousand chariots, i.e. a big state).

    伽藍, on the other hand, is more well known to be pronounced as qie2 lan2. I can't explain how or where I learned this, not being a sinologist myself. Maybe from the many wuxia (martial art) films/TV series that had the "correct" pronunciation.

  Eidolon said,

    April 3, 2017 @ 4:58 pm

    "In sum–and I believe this is what Jerry Norman was talking about–it was the very nature of the Chinese writing system that suppressed forms of local expression, while, on the flip side of that shortcoming, it also provided a kind of illusory linguistic unity that endured in East Asia for a very long time, and in China still does today."

    I'm not taking issue with your first statement. My thrust is that it isn't an *illusory* unity – Classical Chinese *did* unite the literate elite of ancient China – and arguably that of Korea and Vietnam; in fact, this is the reason why, despite China's repeated exposure to alphabetic writing systems, such writing systems were never adopted for local expression – because the scholar officials, who monopolized the bureaucratic and educational systems, made Classical Chinese the lingua franca par excellence. Historically, literacy was a privilege of the elite, and as the elite were the gate keepers of privilege, so they could dictate the rules of membership – in this case, knowledge of Classical Chinese.

    Had Classical Chinese not been readable to speakers of different Sinitic varieties, as would be the nature of an alphabetic writing system, the aforementioned unity could never have developed. In Europe, Latin played the role of lingua franca, but the various competing written vernaculars eventually superseded the significance of Latin during the age of nationalism, as European intellectuals strove to define "nation" along linguistic borders. The absence of such vernaculars in China, by contrast, likely facilitated the continued unity of the Chinese state, even though the scholar officials, as a class, have been replaced by the Communist Party. That is why I do not think *illusory* unity is the proper word – the unity is a fact; the illusion is homogeneity.

  Eidolon said,

    April 3, 2017 @ 5:58 pm

    "What I don't get is why anybody in the sino-logo-sphere ever formed the illusion of linguistic unity (or @Eidolon homogeneity)."

    The Chinese were well aware of the fact that their oral languages were diverse and mutually unintelligible – for example, one could find passages from texts over 2,000 years ago describing the existence of incomprehensible regional speeches, and records of "heavy accents" and "unintelligible speech" among scholar officials are plentiful. But because the only widely accepted writing system in historical China, since the Qin-Han standardization, was Chinese characters, which was used to write an universally understood Literary Chinese 文語, the *written* linguistic unity of China was a fact throughout history. That is to say, any literate Chinese would've known how to read and write Literary Chinese 文語, and the absence of competing vernacular writing systems simply attests to the high degree of equivalence between "literacy" and "literacy in Literary Chinese," which in turn contributed to the unity of the literate class as a community.

    But in the 18th and 19th centuries, when European contact with China first became extensive, the state of linguistic understanding was such that many European visitors mistook this written linguistic unity as either a testament to the disembodied nature of Chinese writing, or proof of Chinese linguistic homogeneity. Many of the ideas regarding China having only one language – with the only differences being pronunciation of said language – developed during this time, likely facilitated by the Chinese elites' own historical belief that the written language was primary while the oral languages subordinate. As to how this latter historical belief came about, see my comments about prestige, above. Literacy in Literary Chinese has always served as a gate keeper to elite status in Chinese society, and as such was generally of more social significance than oral proficiency unless the individual traveled regularly to the capital, in which case proficiency in the capital variety of Chinese was also expected.

  Eidolon said,

    April 3, 2017 @ 7:07 pm

    I apologize in advance for the triple post, but I just had to comment on this fascinating treatise:

    "Some Things Chinese Characters Can’t Do-Be-Do-Be-Do, by David Moser"

    David is, of course, absolutely right in the general sense: a syllabary – any syllabary, by definition – cannot hope to express phoneme-level nuances in speech, and the Chinese writing system is a logographic syllabary, and therefore suffers from these restrictions.

    But a further observation about this phenomenon is that pinyin has a tonal register not available in the English alphabet. Thus, when we envision phoneme-level nuances and word play in *pinyin*, it is significant that the tonal registers of pinyin provide even more expressiveness for the nuances of human speech, which we cannot otherwise convey through use of the standard English alphabet, despite the almost certain existence of tonal nuances in everyday English speech. That is to say, even though we state that the English language is not a tonal language – as defined by the use of tone for semantic differentiation – tones *can* and *are* used in English speech for purposes of accentuation, accent, etc.

    As an example from popular culture, compare the regular pronunciation of the word "Donut" with Homer Simpson's famous intonation: "Mmm… Donuts." There is no written difference between the two; and though you could, perhaps, use word play for the latter by stretching out the word – "Dooonnutsss…" – you still cannot capture the exact tone of speech Homer uses to convey his culinary longing.

    In fact, every writing system, I think, has certain restrictions on their level of expressiveness, and the degree is defined by the level of detail. Syllabaries like Chinese omit phoneme-level information. Alphabets like English omit tonality. But even a tonal alphabet like pinyin would omit information such as texture and cadence. Ultimately, only speech itself is a perfect rendition of speech, and all writing systems are separated from it by degrees of abstraction.

  AntC said,

    April 3, 2017 @ 7:28 pm

    Thanks @Eidolon.

    many European visitors mistook this written linguistic unity as either a testament to the disembodied nature of Chinese writing, or proof of Chinese linguistic homogeneity.

    Hmm? European visitors mostly encountered Cantonese, Hokkien, Shanghainese. (So for example most of the loan words into English are not from Mandarin.) Did the European visitors not notice the disconnect between those languages on-the-ground vs. the writing system?

    I can see that might contribute to the "disembodied nature" interpretation. I still don't get wherefrom the illusion of homogeneity.

    I can see the prestige of the Literate classes; as with Latin literacy in Europe. But nobody claimed European homogeneity — even though South-Western European languages do conspicuoulsy derive from Latin.

  Eidolon said,

    April 3, 2017 @ 8:04 pm

    "Hmm? European visitors mostly encountered Cantonese, Hokkien, Shanghainese. (So for example most of the loan words into English are not from Mandarin.) Did the European visitors not notice the disconnect between those languages on-the-ground vs. the writing system?"

    They did notice the disconnect, and had it been the case that each of these languages had its own writing system, it is doubtful that European visitors would have thought they were the same language. But back in the late 19th century and early 20th century, linguistic understanding, especially of languages outside of Europe, was hardly scientific. Thus we have descriptions such as these:

    "China is an Empire of such vast extent that it may justly be called a continent rather than a kingdom. Naturally, there are many dialects spoken in different parts of the country and yet there is undeniably a Chinese language current throughout the whole of China. This is usually called the Mandarin or official language and is the spoken language-with local variations-of the great mass of the Chinese people. Of the 400,000,000 Chinese probably 350,000,000 speak Mandarin; the remaining 50,000,000 speak more than 20 different dialects all having a more or less distant resemblance to the official language.

    The provinces in which Mandarin is not spoken stretch southward from Shanghai along the coast to the frontiers of Tongking. They are Chehkiang, Fuhkien and Kwantung. … Each of these places has a totally different dialect from its neighbors and it is not possible here to do more than attempt to represent by western phonetics the sounds of the one language of China."

    "Chinese Self-taught By Natural Method: Phonetic Pronounciation Thimm's System" – John Darroch, 1914

    Or an even more explicit statement about the nature of the written language and how it affects the perception of Chinese:

    "The peculiar nature of the written language makes it necessary to explain the use of the word dialect, which has been objected to as not applicable to the various forms of local speech heard over this wide land. Some assert that they rise to the dignity of a language, like the Spanish, Italian, and other offshoots from the Latin; while others regard them as more like the patois heard in various parts of Spain itself, where each amidst its local expressions, retains the idioms and laws of the Castilian. The essential unlikeness between the variations heard in speaking those alphabetical languages, and the greater discrepancies between the sounds given to the ideographic characters, will explain the wider use of the term in Chinese, but certainly does not elevate them into the rank of separate languages.

    The fundamental fact, that no character has an inherent sound, has tended to make and perpetuate these dialects throughout the country; and the general ignorance of the written language by the people at large, has helped to multiply and modify them still further. It, however, entirely misleads to describe any one of these as ' no mere dialectic variety of some other language, but a distinct language'; for until a new sense be given to the word, such a description conveys a misconception of the relation between the spoken and written languages. … No one will dispute the remark that no two Chinese pronounce their words alike, even in any one dialect; but this does not weaken the remarkable power of their written language to maintain the solidarity of the people."

    – "A Syllabic Dictionary of the Chinese Language", Samuel Wells Williams, 1896

    As you can see, Europeans were caught up with the idea that the existence of one, unified Chinese writing system implies that the various Sinitic languages were, in fact, dialects of the one and same language of Chinese. The fact that these dialects were mutually unintelligible did not bother them, as they simply thought it was the nature of the writing system, which had no fixed pronunciations.

    This is not to say ALL European scholars bought into this idea. There were certainly linguists who viewed the Sinitic languages as actual languages. But descriptions such as those I gave above show how the illusion of linguistic homogeneity spread, and this was also facilitated by the rise of Chinese nationalism, which emphasized that the Chinese people had only one language – using, again, the written language as evidence – of which all the various regional varieties were merely dialects.

  Eidolon said,

    April 3, 2017 @ 8:23 pm

    I should slightly modify what I said above by indicating that "the same writing system" should instead be "the same written language"; for even during the late Qing, there still existed an official, unified written language of China that most Chinese and many Europeans believed represented "the Chinese language," of which the spoken varieties are only dialects.

  liuyao said,

    April 4, 2017 @ 8:50 am

    Back to the character 㫬. I searched for it in Zhonghua shuju's digital repository, and it did come up 18 times. Thankfully the digitization was relatively recent that they had the most up-to-date character set (and no doubt in some cases they had to make new ones).

    Based on the results, the source of equivalence of 㫬 and 昫 might be the purported author of Jiu Tang shu (Old Tang History), who was a prime minister in 10th century (Five Dynasties) often given as 劉昫 Liu Xu in published editions. However, one of the annotations in 冊府元龜 (juan no. 145) says

    > [劉]昫,宋本、明本均作“㫬”,據《舊五代史 · 唐書 · 末帝紀》改

    Incidentally when I searched for 眴 (with eye radical), results containing 瞬 came up. That explains the 舜 reading that you referred to, but that would be a mistake.

  Chas Belov said,

    April 5, 2017 @ 12:37 am

    @Chau: It being Wikisource, presumably you (or anyone) could substitute in the proper character.

  Flow said,

    April 5, 2017 @ 1:00 am

    @liuyao are still hurt you were called a non-Sinologist?

  Bathrobe said,

    April 5, 2017 @ 6:01 am


    the *actual* cultural and linguistic unity of elite Chinese, who considered literacy in that language as the basic requirement for group membership, transmitted and shared common rituals and beliefs through that language

    Classical Chinese the lingua franca par excellence

    Reading this, the first question that came to mind was: Was there no spoken correlate of this unified written language? Did the elite really only communicate by brush? And if not, how did they speak with each other?

  liuyao said,

    April 5, 2017 @ 12:40 pm

    @Flow, no I never thought of myself as a Sinologist. All respect for those who are.

  53. Victor Mair said,

    April 8, 2017 @ 8:20 am

    @Bathrobe, with replies by VHM interspersed:

    Reading this, the first question that came to mind was: Was there no spoken correlate of this unified written language?

    VHM: No. There was no direct spoken correlate of the "unified written language".

    Did the elite really only communicate by brush?

    VHM: No. That's a myth. It would have been unworkable to run an empire that way.

    And if not, how did they speak with each other?

    VHM: Through the koine. The great Jerry Norman, in his great book, Chinese, talks about this common spoken language as existing at least from the Tang period (618-907), and there probably had to be something like it from the earliest days of a unified polity in the EAH (East Asian Heartland), the boundaries of which polity shifted dramatically through time. This common language was called guānhuà 官话 (lit., "officials' talk / speech") from the Yuan period (1271-1368) on to the 20th century, after which time the name was reinterpreted as "public talk", and eventually was formalized as Guóyǔ 国语 ("national language", aka "Mandarin", which etymologically means exactly the same thing as the original meaning of guānhuà 官话, viz., "officials' talk / speech". Normally this common language of the officials was based on the speech of the capital, which was usually the "prestige topolect".

    Victor H. Mair. "How to Forget Your Mother Tongue and Remember Your National Language".

    _____. "Buddhism and the Rise of the Written Vernacular in East Asia: The Making of National Languages." Journal of Asian Studies, 53.3 (August, 1994): 707-751. (readily available on JSTOR)

  Victor Mair said,

    April 8, 2017 @ 8:27 am

    From Hiroshi Kumamoto:

    You can look it up at:

    for the dictionaries that carry the character in question.

    The cbeta version that reproduces the 范祥雍 text omits the commentary which is as extensive as the text itself (but of no help in this case).
    Another standard edition of 周祖謨 has only to say that 二人史書無傳 [VHM: "there are no biographies for these two individuals in historical works"]


    VHM: That's a useful tool.

    in this case, it tells us that 㫬 is only included in one Chinese dictionary, the massive 中华字海. Of the 53 other works on this comprehensive list, 㫬 is only cited in a dictionary of old Zhuang characters, in a reference tool for the study of calligraphy, and in a collection of variant forms.

  Victor Mair said,

    April 8, 2017 @ 8:37 pm

    From Bob Ramsey:

    The word Guóyǔ 国语 was a borrowing from Japanese that Wu Rulun embraced after he went on an inspection tour of Japan in 1902. I know you well remember the story, but I'm wondering if the readers of LL are all aware of that piece of history.

  John Rohsenow said,

    April 9, 2017 @ 3:08 am

    When people ask me the origin of the term "Mandarin" language, I tell
    them a version of this story, and say that it was the lingua franca of these

