Language Log

Character crises

June 15, 2018 @ 7:23 am · Filed by Victor Mair under Lexicon and lexicography, Writing systems

From Bob Bauer:

You may have heard that the famous HK-based novelist by the name of 劉以鬯 recently passed away at the age of 99. [VHM: I have intentionally left his name without transcription for reasons that will soon become apparent.]

I did not know how to read/pronounce the third character in his name, so I tried to look it up in some dictionaries. But I first needed to decide what is this character's radical? Trying to find the character by its radical turned out to be a very time-consuming process, as different dictionaries do different things with it — at least one doesn't bother to assign it to a radical.

So long as the Kangxi radical system (from 1716, but originally introduced a century earlier under another name [Zihui] in 1615) functioned more or less as a standard set, people who took the trouble to memorize all 214 of them in order — few people other than very serious Chinese scholars and Western Sinologists did — would have known that 鬯 is itself a radical, namely, #192, pronounced chàng in MSM and meaning "sacrificial brew". In the Kangxi dictionary, out of 49,035 characters, there are only 8 to be found under this radical. It is easily confused with the rather more common radical #197 鹵, pronounced lǔ and meaning "salt". Of the 49,035 characters in the Kangxi dictionary, there are 44 to be found under this radical.

One way or another, by hook or by crook, we have determined that the name of the author who recently passed away, 劉以鬯 (12/7/1918-6/8/2018), is pronounced Liu Yichang in MSM and Lau Yee Cheung in Cantonese.

It was hard enough to find characters under the 214 Kangxi radicals — actually maddeningly difficult. I knew one famous scholar who tried for years to determine the radical of zhāng 章 ("chapter"), which is not at all an obscure character. I can spot 5 possible Kangxi radicals in 章. I happened to visit my friend one day and he told me of his plight. When I informed him what the radical of 章 actually was and helped him find it in a gigantic dictionary with nearly 50,000 characters, my friend actually broke down in tears.

Mind you, these are the frustrations faced by scholars of Chinese on a daily basis even with the Kangxi radical system. See "Sinological suffering" (3/31/17). Most people just can't be bothered looking up characters in a Chinese dictionary because it's simply too darn hard. If they know the pronunciation of the character and they have access to a dictionary arranged by sound or that has an index arranged by sound, they will surely go straight for that rather than trying to figure out the radical and then count the residual strokes. I have empirical proof of that, because dictionaries with alphabetical indices are always far more heavily soiled and worn on those pages than in the main part of the dictionary.

But now we enter the period of crisis and chaos in Chinese lexicography. About 20-30 years ago, dictionary makers in China decided to abandon the standard system of 214 Kangxi radicals. Instead, many of them — even well-known and important dictionaries — went about establishing their own systems with 188, 189, 200, 246, 252, and other total numbers of radicals. China is now faced with a tragic situation where such a great, large nation has no standard for ordering their 50,000 and more characters.

At the same time, as we have so often pointed out on Language Log, people are forgetting how to write the characters:

"Character Amnesia" (7/22/10)
"Character amnesia revisited" (12/13/12)
"Spelling bees and character amnesia" (8/7/13)
"Character amnesia and the emergence of digraphia" (9/25/13)
"Dumpling ingredients and character amnesia" (10/18/14)
"Character amnesia in 1793-1794" (4/24/14)
"Japanese survey on forgetting how to write kanji" (9/24/12)
"Character amnesia redux" (4/22/16)

I was reminded of this today when a friend wrote to me from China saying that, in a community education class of literate adults ages 28-40, 5 out of 6 people couldn't write the word for "hatch; brood; incubate" (fūluǎn 孵卵), and the person who sort of could felt the need to check a dictionary just to be sure. 孵 is #3256 on a frequency list of 9,933 characters, and 卵 is #2008

Coupled with the ease of Romanized inputting, the universal teaching of English to all schoolchildren from elementary grades through college, the bifurcation between simplified and traditional forms of the characters, and increased familiarity with Western culture (despite government attempts to prevent it), the combined crises in lexicography and character amnesia mean that anything could happen with the Chinese writing system in the coming years.

June 15, 2018 @ 7:23 am · Filed by Victor Mair under Lexicon and lexicography, Writing systems

Permalink

60 Comments

SO said,

June 15, 2018 @ 8:00 am

So, what did the famous scholar do to waste several years to find the radical of 章? Even when relying on paper dictionaries only, he could simply have checked out each of the 5 possibilities you can make out when looking at the character (and presumably the scholar could spot these possibilities as well), which cannot possibly take longer than a few minutes at most. And even if his dictionaries lacked some sort of pronunciation-based index, not few character dictionaries should also come with an index based on the number of strokes. Counting the strokes in 章 is pretty straightforward, so such an index should lead to the entry in question within minutes, at most.

In short: I wonder what exactly that famous scholar did with his dear time?
Alyssa said,

June 15, 2018 @ 8:02 am

I found the SKIP code system to work very well for Japanese characters, is it not in use in China at all? I found 鬯 exactly where I expected it at 2-8-2. 章 was harder – it's at 2-2-9 rather than 2-5-6, which seems an odd choice.
Victor Mair said,

June 15, 2018 @ 8:54 am

@Alyssa

I've never heard of SKIP being used in China.

Please give us a link that introduces us to how it works.

Sounds like there's considerable ambiguity in the system. Stroke-wise 章 is relatively straightforward, but you thought it should be under 2-5-6 ("an odd choice" in your mind) rather than 2-2-9, missing two of the three determinants.

Conversely, stroke-wise 鬯 is not at all obvious and straightforward, yet somehow you managed to find it with SKIP.

The system seems to have decided vagaries, such that I don't think I'd want to waste my time trying to learn it, though, out of curiosity, I'm willing to take a look. Mind you, there are hundreds and hundreds of Chinese character finding / inputting systems out there, and none of them are easy and foolproof. However, as I said, I'm always interested in becoming acquainted with yet one more ingenious method for looking up Chinese characters, even though I know by the nature of the game that it won't be a piece of cake.

There's no magic bullet for finding characters in Chinese dictionaries — unless you already know how they are pronounced, in which case you look them up by sound.
Victor Mair said,

June 15, 2018 @ 9:15 am

"I wonder what exactly that famous scholar did with his dear time?"

Mostly he agonized.

Do not belittle the grief of people who try to find characters in Chinese dictionaries. It is genuine. Their inability to find characters in massive dictionaries is not due to their stupidity; it's because the character system itself is not designed to be easy and efficient for searching.

In the case of my friend and his bootless search for zhāng 章 ("chapter") in his gigantic dictionary, there was a total stroke count index, but it was hundreds of pages long, and under each group of total strokes (11 for 章), the characters were arranged by radicals, so he was back at the same problem of not knowing what radical it was listed under.

As a matter of fact, my friend knew that 章 should have been listed under Kangxi radical #117 lì 立 ("stand") — any Sinologist worth his salt should be able to tell you that, but it just wasn't there. For some bizarre reason, the editors of the dictionary decided to put it under Kangxi radical #180 yīn 音 ("sound"). My friend didn't spot that as one of the five potential radicals for 章, and even if he had, he would never have dreamed that it made any semantic sense to place "chapter" under "sound". Because I had the same humongous dictionary in my study at home, after much agony of my own, I had discovered that the editors misfiled 章 under the wrong radical!

Be more sympathetic. Don't blame my friend for being stupid. He's not. He's actually extremely intelligent.
Alyssa said,

June 15, 2018 @ 9:51 am

Wikipedia has a good explanation of SKIP here:
https://en.wikipedia.org/wiki/Kodansha_Kanji_Learner%27s_Dictionary#SKIP

I'm not sure if it's used by native speakers, but as a learner I found it very intuitive and speedy.
SO said,

June 15, 2018 @ 1:20 pm

Thank you for your detailed reply, but please don't blame me for things I've never said. I didn't call anybody "stupid" and didn't explain anything as the result of "stupidity".

I simply couldn't (and still cannot) imagine how anybody could possibly spend several years trying to find out "the" (see below) radical of 章 — or any other common character for that matter. Even with paper dictionaries, even with the need for quite some trial-and-error, etc.

The "bizarre reason" for putting 章 under 音 (as in the Yupian 玉篇, the later Leibian 類編 etc.) is most likely the Shuowen jiezi 說文解字, which analyzes the character in question as 从音从十 and groups it together with various other characters containing 音. Doing so in a more recent dictionary might thus not be in accordance with Kangxi & Co., but depending on your point of view (and whether the editors of a given dictionary claim to follow Kangxi in each and very case or not) 章 was maybe not "misfiled […] under the wrong radical" after all. I don't know which dictionary this is all about, but maybe the editors are thus not to blame in the end.
Victor Mair said,

June 15, 2018 @ 2:08 pm

If the finding systems for Chinese characters cause so much trouble for highly trained and intelligent specialists, it's no wonder that average citizens seldom look things up in Chinese dictionaries. The whole point of this post has gotten derailed by honing in on an anecdote about a single character out of the tens of thousands that make up the Chinese writing system.

Arguments over Shuowen vs. Kangxi are immaterial and otiose when confronted with the reality that China does not have any standard set of radicals which its citizens can master and by means of which they can look up essential information in dictionaries and other types of reference works. I consider this to be a tragedy of enormous magnitude.
TM Jones said,

June 15, 2018 @ 3:32 pm

The handwriting recognition function on my phone gave me both 鬯 and 章 in under 5 seconds. When I first started looking up characters this way, I nearly wept thinking of all those hours I spent/wasted as a student flipping through various indexes.
Daniel Barkalow said,

June 15, 2018 @ 3:41 pm

Applying Google Translate to 劉以鬯's zh.wikipedia page comes up with Liu Yixi, and then also comes up with Liu Yixuan, and then says his pen name is Liu Yijun (same characters). Translating the titles of references also supplies Yiyi Liu, Liu Yifan, and simply Liu Yi.
Michael Watts said,

June 15, 2018 @ 4:05 pm

Arguments over Shuowen vs. Kangxi are immaterial and otiose when confronted with the reality that China does not have any standard set of radicals which its citizens can master and by means of which they can look up essential information in dictionaries and other types of reference works. I consider this to be a tragedy of enormous magnitude.

This was certainly a problem in the past, but it's odd to describe it as "a tragedy of enormous magnitude" today when the problem no longer exists. It takes a couple of seconds to look up 鬯 on your phone or tablet through the magic of handwriting character input. If the goal is to learn the pronunciation, that's all you need. If you need to look it up in a dictionary, learning the pronunciation will let you index by pinyin. If you need a formal analysis of what the radical is, I get that as a foreigner by tabbing over to the component breakdown in Pleco, which notes it as "Radical Number 192, 鬯". If I were a real Chinese person, I'd probably look on baidu, where searching for 鬯字 suggests this baidu hanyu result, which notes that the character falls into the 部首 of "鬯".

Getting stuck on looking up a character is only an issue today if you're a thoroughly trained Sinologist who refuses to use modern methods as a matter of principle.
Michael Watts said,

June 15, 2018 @ 4:08 pm

"Falls into the 部首" should read "falls under the 部首".
Jim Breen said,

June 15, 2018 @ 6:01 pm

Jack Halpern's SKIP is a great system, and it may be used more widely now that Jack has made the process and codes available under a CC-SA licence.

Another very fast technique is using a multi-component decomposition. I helped get this going 20+ years ago, and now there are several WWW-based implementations. My original one is at: http://www.edrdg.org/cgi-bin/wwwjdic/wwwjdic?1R and there are good interactive implementations, such as Kim Ahlström's one at jisho.org: https://jisho.org/#radical

Back in 2004 I did workshop paper at COLING on the various indexing methods for finding kanji (http://www.edrdg.org/~jwb/paperdir/kanjindx.pdf). I pulled some measurements from the WWWJDIC server on what people were using, and at that time the multi-component method was the most popular.
Victor Mair said,

June 15, 2018 @ 8:01 pm

It is disadvantageous for a supposedly literate population not to know the basic elements of their writing system, how many there are, and in what order they are sequenced.

As mentioned in the o.p., there was a de facto set of 214 radicals for several centuries until about the middle of the 20th century, then — under the PRC — the system started to break down, with various publishers striking out on their own and coming up with a variety of different sets and sizes of radicals. The two most important lexicographical projects undertaken by the PRC, Hànyǔ dà cídiǎn 漢語大詞典 (The Unabridged Dictionary of Sinitic) and the Hanyu da zidian 漢語大字典 (Unabridged Dictionary of Chinese Characters), compiled primarily during the 80s, both have 200 radicals, while the little Xīnhuá zìdiǎn 新华字典 (New China character dictionary), which has sold more than 400,000,000 copies in numerous editions, has 201 dictionary radicals and 100 "supplementary components".

Shuōwén Jiězì 說文解字 (Explaining Graphs and Analyzing Characters), compiled by Xu Shen during the first quarter of the 2nd c. AD and the first Chinese dictionary to employ a system of radicals, had 540 "section headers" for only 9,353 characters. Go figure.
Bathrobe said,

June 15, 2018 @ 8:43 pm

Isn't the radical for 章立? At least that's what Wiktionary says… Agree with SO. Maybe not easy if you're wading through a 50,000 character dictionary, but surely not impossible to find.
Victor Mair said,

June 15, 2018 @ 9:12 pm

I pointed that out in the comment at 9:15 a.m.

http://languagelog.ldc.upenn.edu/nll/?p=38784#comment-1551976
cliff arroyo said,

June 15, 2018 @ 11:46 pm

I think it's interesting that whenever Victor points out real world problems of characters (like most users not being able to sort them or write them or find out what they mean without using a completely separate non-character auxiliary system) so many super-literates are quick to point out that this isn't a problem because…. (mainly they've solved it for themselves or haven't been affected so much by it themselves).

It's also interesting that here some of the reasons given here for the lack of a radical system not being a problem (despite making some types of meta use of characters impossible) rely on technology that will only deepen the gap between the writing system and its users.

Back before I gave up trying to learn characters (Japanese) I never thought the radical system made any sense because there was no guide as to where it might occur and too many characters assigned to particular radicals didn't actually seem to contain that radical… (the part called the radical wasn't a normal variant of the radical but something else that somebody had decided would be classified as the radical for that character).

Whenever I asked someone about this (usually a superliterate), they just said it had never been a problem for them (or words to that effect) which just made me more and more dead to the very idea of characters.
Philip Taylor said,

June 16, 2018 @ 6:23 am

By devising such a complex and fiendishly difficult writing system, and by requiring that candidates for official positions be able to demonstrate their complete mastery of the system, the Chinese of the Sui and Qing dynasties ensured that those who held such positions were of considerably above-average intelligence. If only the Western world had been able to recognise the benefits of such a system …
Alex said,

June 16, 2018 @ 11:35 am

" rely on technology that will only deepen the gap between the writing system and its users."

I was going to write something about how easy it is to find things via mobile phone apps these days.

I think that statement by Cliff is to the point. There is no more rhyme or reason its just memorize for students writing wise or dont bother and just look up with an app when needed. Unfortunately children aren't allowed to.
Alex said,

June 16, 2018 @ 11:36 am

Once again I cant imagine how much this held back innovation/gdp back in the day.
~flow said,

June 16, 2018 @ 12:12 pm

To me Jack Halpern's SKIP system and many other ingenious devices invented to catalog Chinese characters—e.g. the 1920s Four Corner method—suffer from the shortcoming that the resulting series of figures does not look very systematic, that the groups do not feel natural at all. The Kangxi system (and its variations) at least manages to put characters together in groups that *mostly* have one obvious component (often but not always the one written first) in common. This makes the one, ten or one hundred thousand characters you want to list in your dictionary look much less systematic than they really are.

It is a strange and interesting fact that the Chinese have never developed a definitive inventory of character components, much less a standard ordering for them. The frustrating thing about looking up a character in most dictionaries is that in case you didn't find it, you can most of the time not be 100% sure it is not hiding somewhere else. 章 looks deceptively simple, but it can reasonably be listed under any of 亠, 立, 音, 日, 曰, 十, or 早 (if you want to make the last one a radical). This problem can be and has been mitigated by systems that do not attempt to put (most) characters under their etymologically (excuse the use of the term) 'correct' part (something that is hard to impossible in many cases anyway for a number of reasons); instead, those systems adopt rules that allow to determine the appropriate radical without prior knowledge. In such a system, 章 could be listed under 音 because it is the most inclusive figure that you find starting from the top; 亠 and 立 are excluded because they are both smaller than 立; 日 and 十 are excluded because they come only after other candidates. That's not perfect but works quite well in a great number of cases. There's something of a long tail of strange and infrequent character components, though, for which this system will never work very well.
~flow said,

June 16, 2018 @ 12:21 pm

BTW as for Chris Arroyo's remark, "too many characters assigned to particular radicals didn't actually seem to contain that radical" is as such indisputable because there are indeed such cases and any one can be judged to bee too much. What I do not think is that there are very many characters among the five or ten thousand in current use that do not rather obviously contain the radical that they are usually listed under. This number should be quite small. If anyone is interested, I could perhaps come up with some statistical data.
PeterL said,

June 16, 2018 @ 1:34 pm

I had no problem looking up 章 in my Nelson Japanese-English dictionary — its 12-step system for finding the radical led me to 立, which then pointed me to 5112 (under 音), which showed that 立 was the traditional classification after all. (It's seldom necessary to go through all 12 steps; usually the correct choice is evident after just one or two steps.)

When I tried JED on my phone, it gave me these components: 立、日、十、音.

So, the problem seems solvable — index by all the possible radicals (ideally using something like Nelson's 12-step system for choosing the most likely), with each "wrong" item pointing to the canonical entry. But Nelson has only 5,446 kanji; maybe its system wouldn't scale to 50,000.

[BTW, I don't understand how Nelson came up with its choice of radical for 来).
Scott P. said,

June 16, 2018 @ 1:56 pm

What seems to be needed here is an extensive system of cross-referencing, so that you could look up a character under any likely radical and be guided to the correct entry. I assume that sheer size would make any such attempt pointless?
~flow said,

June 16, 2018 @ 3:48 pm

@PeterL, Scott P.—cross-referencing all CJK characters under each of their components is exactly what I'm doing.

In my scheme you can find each character under any of its components; additionally, there are cross-references from all components that contain other components (so 金 is cross-referenced from 人 and 王, 食 from 人, 良 and 艮 and so on). As for size, I currently have 275,789 entries for 88,597 characters, so on average each character appears 3.11 times (*almost* pi times…). For the 10,000 most frequent characters the ratio drops to 2.812 because frequent characters are on average a bit simpler; that trend continues to 2.399 for the top 1,000 and 2.24 for the top 100 characters.

In short, there is naturally a multiplication in size for a full componential index of CJK characters, but the factor is not unbearably large.
Jim Breen said,

June 16, 2018 @ 8:05 pm

~flow wrote:
>> cross-referencing all CJK characters under each of their components is exactly what I'm doing.

As I mentioned earlier, such a task was done about 20 years ago and the data is freely available. See: http://www.edrdg.org/krad/kradinf.html

PeterL mentioned looking up 章 in the "JED" phone app. That app is one of many that use this data.
David Moser said,

June 16, 2018 @ 10:41 pm

Resending this passage from an earlier paper, relevant to this discussion:

I have occasionally taught English to Beijing schoolchildren, and one day several years ago I was helping a class of third graders review English words for body parts. One little boy wrote “knee” on the blackboard, and then, as he attempted to write the Chinese translation xigai 膝盖, realized he could not write the characters. I found this rather intriguing, and I begin to quiz the class on common words for body parts and everyday objects, and within a few minutes we came up with a list of words like yaoshi 钥匙 “key”, niaochao 鸟巢 “bird’s nest”, lajiao 辣椒 “hot pepper”, huazhuang 化妆 “makeup”, gebo 胳膊 “arm”, jugong 鞠躬 “bow”, and so on, all of which they could write (or correctly guess) in English, but could not successfully render in Chinese script. Abilities varied greatly, of course, and a couple of the brighter kids could seemingly write almost any character, but for most of them, their written English lexicon had already made a few semantic inroads that were still inaccessible via the Chinese characters. After the class I mentioned this interesting (and to me, distressing) state of affairs to some of the parents who stayed on to chat with me. This gave rise to a lively discussion, during which we found that many of the parents, to their bemused chagrin, also stumbled over characters in common words like saozhou 扫帚 “broom”, gebozhou 胳膊肘 “elbow”, zhouwen 皱纹 “wrinkle”, aizheng 癌症 “cancer”, menkan 门槛 “threshold”, yulin 鱼鳞 “fin”, chiru 耻辱 “shame”, xidicao 洗涤槽 “kitchen sink”, Lundun 伦敦 “London”, and so on. Many of these adults held advanced degrees, and one was an editor at a Beijing newspaper. One of the parents sheepishly confided in me “I feel embarrassed when my child asks me how to write a character, because I often can’t remember, either. This has happened so often that I’ve totally lost face in this regard, and nowadays the joke in our house is ‘Look it up, you’ll remember it longer.’”
~flow said,

June 17, 2018 @ 5:01 am

@Jim Breen I'm well aware of the krad data under http://www.edrdg.org/krad/kradinf.html and looked into it shortly after starting my own similar project. The salient differences between the two projects are that

* I mainly use glyph decompositions from the Kanji Database project (http://kanji-database.sourceforge.net/, https://github.com/cjkvi/cjkvi-ids) as departures; these are structural representations like 窥:⿱穴规 (using so-called Ideograph Description Sequences (IDS));

* these IDSs contain spatial information (in ⿱穴规, 穴 is above 规)

* and preserve writing order (when writing 窥, 穴 is written before 规);

* furthermore, all decompositions are minimal with regard to the number of elements used. 窥 can also be written as ⿱穴⿰夫见, ⿱⿱丶㓁⿰夫见 or ⿱⿱宀八⿰夫见 and so on, but since there exists a codepoint 规:⿰夫见, the shortest possible formula for 窥 is ⿱穴规. Quite a few arrangements do not allow for such a neat representation or have more than one of them.

* A standardized representation is then achieved by recursively substituting formula elements that are not in a catalog of basic figures (termed 'factors') by their respective formulas; in the case of 窥, this is done in a single step to give ⿱穴⿰夫见.

* An index of characters is then built by basically sorting all the factors by their stroke order codes and listing each character under each of its factors.

* Additional information is preserved and can be searched for, e.g. all characters that have a number of factors in common, or that have a certain sequence of factors, or ones that have two factors in a given spatioal arrangement in common.

* Furthermore, the information that surfaces during the decomposition process is also preserved, so it's possible to search for appearances of compounds that do neither show up in the short formulas nor in the factorial formulas.

There are numerous example for projects that decompose CJK characters into all of their constiutents and use that data to build paper-based or digital indexes, two examples are 漢字首尾二部排檢法 by 杜學知 (1962) and 汉字编码的理论与实践 by 陈爱文 et al. (1986); of websites that allow to search such data on-line, http://www.edrdg.org/krad/kradinf.html may be the first one (at least I'm not aware of anything prior to that).
David Marjanović said,

June 17, 2018 @ 5:10 am

I would not be surprised to learn that the makers of the Kāngxī dictionary thought that their system was at least partially "real", tried to catalog as many characters as possible under their "true" radicals, ended up using sources like the Shuō wén, jiě zì, and consequently made lots of otherwise counterintuitive decisions on purpose.

Langenscheidt's German↔MSM dictionary lists some characters under several radicals; I don't know how far it goes, though.

By devising such a complex and fiendishly difficult writing system, and by requiring that candidates for official positions be able to demonstrate their complete mastery of the system, the Chinese of the Sui and Qing dynasties ensured that those who held such positions were of considerably above-average intelligence. If only the Western world had been able to recognise the benefits of such a system …

The trouble here is that intelligence is not a monolith. Being good at one intellectual task does not generally predict whether you'll be good at another. Memorization, for instance, seems to be completely independent of several kinds of problem-solving…
~flow said,

June 17, 2018 @ 5:38 am

@David Marjanović, for those who don't know it, Langenscheidt apparently put some original work into their 2008 Mandarin and German dictionary; they use the traditional 214 Kangxi radicals for the index, with the twist that under many headings there are listed variants and positional forms that do not appear in most radical lists; also, although the entire dictionary only uses 'simplified' characters, the 'traditional' forms are listed with the radicals, and only for the radicals.

It would be interesting to see some example of characters listed under more than one radical in the Langenscheidt; can you think of some?
Victor Mair said,

June 17, 2018 @ 6:04 am

From an anonymous correspondent, commenting on what cliff arroyo said above:

A very valid comment. I, too, find this kind of rationalizing in both Chinese and Chinese-learning foreigners, and it's quite annoying. Each time they find a workable solution to a problem posed by Chinese characters, they seem proud of their hard-won "solution", not realizing that the real problem still exists, and their "solution" is only a stop-gap measure that allows them to get things done, but never addresses the root of the problem. The "solutions" I've heard over the years include things like:

Oh, you can't figure out the radicals? You should do what I did, and just spend a week or so writing them out and memorizing them. That makes it easier to look up characters.
Oh, you can never remember how to write the character 赢 ying2 "to win"? Oh, it's simple, just think of the three characters below, 月，贝 and 凡 and make up a simple little poetic image involving the moon, a cowry shell, and something "ordinary", and you'll never forget it again.
Oh, you haven't yet bought the xxx dictionary? It has a handy table of hard-to-find characters, I photocopied it and hung it on my wall to easily consult. That solves most of my problems.

And so on. The "super literates" as he calls them are able to plow ahead in their daily work because… well, because they're super literates! They are proud of their little strategies, and seem to have lost all sympathy for the poor souls just beginning to struggle with all these obstacles for the first time.

Learning a foreign language and script is hard. Chinese is one of the few languages that requires the learner to first spend an inordinate amount of time "learning how to learn" before they can actually start learning.
Victor Mair said,

June 17, 2018 @ 9:48 am

From a colleague with half a century's experience learning Chinese:

My first stop was Mathews index of characters with "hard to find radicals". No luck.
stephenl said,

June 17, 2018 @ 2:16 pm

One could also follow the very ancient and sensible practice of 杜甫 (Du Fu)

讀書難字過
(something like: reading books, I skip difficult characters)

( from https://avva.livejournal.com/3121411.html )
Michael Watts said,

June 17, 2018 @ 3:39 pm

The trouble here is that intelligence is not a monolith. Being good at one intellectual task does not generally predict whether you'll be good at another.

This is the opposite of the psychometric result; the finding that good performance on any one intellectual task is in fact a predictor of good performance on others is known to psychometrics as the "positive manifold".
loonquawl said,

June 18, 2018 @ 2:01 am

I am unfamiliar with chinese script, and can only guess at the problems stemming from the listing of characters according to some (obviously non-obvious?) 'radical' form.

But are paper-bound dictionaries a real problem anymore with the advent of chracter recognition via smartphone? Is there a character that would not reliably be recognised by either photographing it, or recreating it on screen? All that with extraneous info (usage, …) and neighbours in both geometry and meaning?
Ash said,

June 18, 2018 @ 9:02 am

@David Moser: I truly believe this is the result of blindly memorizing characters without understanding their functional components and how they function. The whole notion of "radical" basically does more to obfuscate functional components than to identify them. It is exactly understanding functional components that allow one to recall a character (in most cases) using the sound and meaning of the word you're trying to write (almost like a mental IME). For instance, when I was taking my PhD qualifying exams in my previous department (which were all in hand written Chinese with no aids electronic or otherwise), I only struggled twice during two 6 hour exams. I started to write 停頓 and once I finished writing 停, I realized I forgot how to write 頓. However, I happen to remember that it originally meant "bow down and put your head on the ground", so I started writing semantic components having to do with "head" in the margin of my exam: 首頁 and then components that can represent tun, dun: 屯享 (looks like xiǎngshòu de xiǎng, but is actually the left side of 敦). Once I saw 屯 and 頁, it triggered my memory (like using an IME on a computer) of 頓. If everyone learned characters by how they represent sound and meaning, they'd be able to do this (most of the time). I can actually remember how to write characters better than I remember English spelling (I judge this by the number of times I have to look stuff up). Anyway… thought I'd throw this in there.
Victor Mair said,

June 18, 2018 @ 9:51 am

From John Renfroe:

You might be interested in an article I wrote for the LA Review of Books' "China Channel" blog. I talk about functional components (form, sound, meaning, and empty components as we term them in our dictionary), and about the futility of trying to understand characters in terms of radicals as many students do (i.e., radicals are for dictionary lookup, and may or may not be related to the character's actual structure). It's brief and for a decidedly non-specialist audience who may not be familiar with the writing system in the first place.

Radical Characters: https://chinachannel.org/2018/06/12/radical-characters/

I also covered similar material on our blog, albeit with more detail and perhaps less lucidity.

Getting Radical About Radicals: https://www.outlier-linguistics.com/outlierblog/2018/2/14/getting-radical-about-radicals
Bathrobe said,

June 19, 2018 @ 2:48 am

Very interesting articles at Outliers. These are all things that I've known for a long time in a very rough sort of fashion (often disorganised and inconsistent) but had never put together into a coherent picture. You have to be very much into the "etymology" of characters in order to put this information together. If you follow conventional dictionaries you never question what lies behind the system and end up bumbling along with a bit of nip and tuck here, a bit of rationalisation there.

The most valuable contributions I found are the clarification that:

* Radicals are not really 'roots' (nor do they necessarily represent meaning categories), they are just a method of indexation.

* Corruption of various kinds has destroyed the original 'etymologies' of many characters. The fact that a character has the 'water radical' does not necessarily mean that a character has anything to do with water — the water could be there through corruption.

The only problem I found was that even though explanations were simplified to be understandable to the non-specialist reader, they can at times be hard to understand unless you really know your characters!
SO said,

June 19, 2018 @ 7:29 am

You might also be interested in the slides to Wolfgang Behr's presentation "Radical misconceptions: On the background and consequences of European ideas about bushou 部首" which can be found here: http://www.academia.edu/36661867/
Chris Button said,

June 19, 2018 @ 10:09 am

Leaving the etymological justifications aside for the time being, a sample of the 求, 九 and 瓜 phonetic series in my "Derivational Dictionary of Chinese and Japanese Characters" gives the following:

求: 救; 裘; 狐 (use of 瓜 for 求); 究 (use of 九 for 求)
九: 尻; 肘 (corruption of 九 to 寸); 弧 (use of 瓜 for 九)
瓜: 孤; 球 (use of 求 for 瓜); 軌 (use of 九 for 瓜)

In an impossibly ideal world, all the characters in each of the three series would use their "correct" phonetic (e.g. 狐 would be written with 犭and 求). Re-assigning all the characters to their "correct" phonetic components does not nullify the need for copious indices in the back to aid in character location, but does at least provide the logic behind the shape of each character.
Victor Mair said,

June 19, 2018 @ 12:58 pm

From a correspondent in China who has a PhD in late medieval Chinese literature:

"Character Crises", together with its commentaries, is very provocative. I checked the character chàng（鬯）in Xiàndài hànyǔ cídiǎn（现代汉语词典）and found its radical is bĬ（匕）instead of chàng（鬯）and there is actually no radical chàng（鬯）in 现代汉语词典. Then I used handwriting recognition function on my cellphone, and it turned out the app Zìdiǎn cídiǎn (字典词典) and Baidu Hànyǔ（百度汉语）recognize chàng（鬯）as its radical (maybe they are based on Kangxi dictionary？) .

There is certainly a discrepancy, though 现代汉语词典 is supposed to be the most popular and authoritative dictionary which each student at school should follow. Being a native speaker, it has never occurred to me before that there is not a standard for ordering the Chinese characters with radicals. Maybe it is because I always prefer the alphabetical pinyin than radicals for looking up characters. I still remember how I disliked using the radicals and counting stokes when I had to check the pronunciation of a character in the dictionary. The frustration the famous scholar had gone through is all too familiar, but I was luckier not to spend several years as there was always a correct answer in the text book. It is indeed difficult for people who don’t have such experiences to imagine what they are like, or maybe only people with sympathy and imagination can well understand the past.
Ash said,

June 19, 2018 @ 9:32 pm

@Chris Button What do you mean exactly? How are you determining what the "correct" phonetic should be? I get what you're saying with 肘, but the rest I don't understand. I've looked at the series you're talking about in Middle Chinese and Old Chinese and I still don't see it.
Bathrobe said,

June 20, 2018 @ 12:17 am

It is indeed difficult for people who don’t have such experiences to imagine what they are like

I sympathise! I spent many years using Nelson (Japanese), followed by other dictionaries, and now the radical lookup on my Mac. It is a time-wasting and at times frustrating method of looking up characters and accounts for part of my misspent youth.

The only system I find more frustrating is the lookup method for traditional Mongolian script, which is extremely unkind, time-consuming, and eye-glazing (the word can be there in plain sight and you still miss it).
Chris Button said,

June 20, 2018 @ 12:07 pm

@ Ash

That comes down to the etymological considerations I left aside earlier. For example, the word represented by 求 *gə̀w (< *gʷə̀ɣ) "seek" is clearly etymologically related to 究 *kə̀ws "research". However, the phonetic being used in 究 is 九 *kə̀wʔ (< *kʷə̀ɣʔ) rather than 求 since any such etymological considerations were clouded by basic phonological correspondences.
Ash said,

June 21, 2018 @ 2:02 am

@Chris Button
Are you saying that "seek" is the original meaning for 求 and that "research" is the original meaning for 究? (by "original meaning", I mean "the meaning of the spoken word that the character form was created to represent"). Or are you saying that 究 was created at some point after 求 took on the meaning "seek" and therefore 求 would have made a better phonetic? I've not seen any paleographers that assign "seek" as the original meaning of 求 or "research" to 究, which brings up a lot of questions. Or are you saying given the meanings that 求 and 究 took later, it would be more ideal if 求 was the phonetic?
Onelegkick said,

June 21, 2018 @ 4:34 am

I recognized it immediately as the "American coffee" component of this character:

https://www.nippon.com/en/nipponblog/m00088/

(Mnemonics are cool.)
Chris Button said,

June 21, 2018 @ 8:40 am

@ Ash

As you are undoubtedly aware, words shift in meaning over time. Through etymology we can study the core meaning that runs through families of words. The only reason I chose 求 and 究 above was that the etymological relationship between the glosses of "seek" and "research" is clearly apparent today, but we can of course go much deeper than that.

A more appropriate gloss for the concept underlying the word-family graphically represented at its head by 求 would be "forage". With this we can compare 裘 whose gloss of "fur" is etymologically related in English to the "for-" in "forage" (we can also incidentally note the furry 狐 "fox" here too with its use of 瓜 for 求). However, here is where the palaeographic record becomes interesting since the oracle-bone character behind the modern form of 求 rather surprisingly turns out to be the same as that behind phonologically unrelated 殺 (杀) "kill". Indeed, if you squint at the modern forms 求 and 杀 you can see how they could once have come from the same ancestral form. Further compounding the issue is that the original form of 裘 actually had no 求 component at all!

In fact, the earliest form of 裘 was a furry version of 衣 "clothing", to which 杀 was later added presumably due to its semantically appropriate depiction of an animal carcass (hence its modern gloss of "kill") . This combination of the furry form of 衣 with 杀 was later abbreviated in derivatives simply to 杀 which was already in the process of being deformed to modern 求 and hence no longer graphically associated with phonologically unrelated 杀.
Ash said,

June 21, 2018 @ 12:50 pm

@Chris Button Whose analyses are you looking at? I'm not familiar with the notion that 求 = 杀?
Eidolon said,

June 21, 2018 @ 5:37 pm

"…common words like saozhou 扫帚 “broom”, gebozhou 胳膊肘 “elbow”, zhouwen 皱纹 “wrinkle”, aizheng 癌症 “cancer”, menkan 门槛 “threshold”, yulin 鱼鳞 “fin”, chiru 耻辱 “shame”, xidicao 洗涤槽 “kitchen sink”, Lundun 伦敦 “London”, and so on. "

These words might be common in speech, but each of them feature characters that are not common in writing. That seems to be the problem.

The issue with the Chinese writing system is that it is trying to be, simultaneously, phonetic and ideographic. Picking one or the other would make its system much more logical. An alphabet would naturally lend itself to organization along sound particles, and its lexicon would be decomposed as such; while a pure ideograph could be organized according to semantic categories, and you might then be able to look up the graph for "kitchen sink" – as an example – by associating the idea of kitchen with the idea of sink.

As it is, the Chinese writing system allows for neither approach without outside help. The characters themselves aren't phonetic enough to be decomposed into sound particles, yet they also aren't ideographic enough to be decomposed into ideas. The worst of both worlds, so to speak, and I can sympathize with those who were utterly frustrated at trying to create a sensible look up dictionary for it.
Chris Button said,

June 21, 2018 @ 8:48 pm

@ Ash

If you look at 裘 in ctext.org, you can see the furry form of 衣 as the OB form and the form with plain 衣 and something in the middle as the bronze form. That something is the graphic ancestor of the unrelated words represented by 求 and 杀 (although you might want to take a look at Takashima's 2009 "Ji Sacrifice" article regarding the relationship between 杀 and 㣇 in that regard).
Jonathan Smith said,

June 21, 2018 @ 11:43 pm

As far as I know, though I can't speak to the referenced article by Takashima, Chris Button's idea about and is "original research" — just what belongs in an original reference work, of course, but probably not (yet) in a resource like Ash's. I don't know that there is much consensus about what thing the form represented or what word it was first intended to write. The more traditional approach to surely involves ? FWIW, one of the better online collections of "authoritative" opinion about character forms is probably CUHK's; here is 求.
Jonathan Smith said,

June 21, 2018 @ 11:44 pm

Argh… with quotation marks instead:
As far as I know, though I can't speak to the referenced article by Takashima, Chris Button's idea about "求" and "杀" is "original research" — precisely what belongs in an original reference work like his, of course, but probably not (yet) in a resource like Ash's. I don't know that there is any consensus about what the form "求" represented or what word it was intended to write. The more traditional approach to "杀" surely involves "术"? FWIW one of the better online collections of past opinion about character forms is CUHK's; here is 求.
Ash said,

June 22, 2018 @ 12:26 am

@Jonathan Smith: Yes, the view on the CUHK site regarding 求 is the one accepted by 裘錫圭、唐蘭、季旭昇、何琳儀、黃德寬 and I would imagine by 陳劍, since he hasn't written about it (that I know of). If those guys all agree, that looks like a consensus to me. I've not read Takashima's paper though, so I can't comment on it.
Chris Button said,

June 22, 2018 @ 9:46 am

@ Ash and Jonathan Smith

A palaeographic confusion between otherwise unrelated 求 and 㣇 / 杀 (also read as 祟 without any graphic association) is not an original idea at all and has been addressed by various scholars stretching as far as I am aware from as far back as Chen Mengjia (1936) all the way through to Takashima (2010) in his Bingbian commentary. The Takashima (2009) article mentioned above includes a palaeographic analysis of the development of 㣇 and 杀 in the context of 祭 (depicting a hand shredding meat) representing a similar concept to 杀 (depicting a dismembered carcass) and hence the etymological relationship between the words they represent.
Ash said,

June 22, 2018 @ 10:00 am

@Chris Button
Not saying it is an original idea. I'm saying that top paleographers don't agree with it. That doesn't mean you have to follow them. I'm just throwing that out there.
Chris Button said,

June 22, 2018 @ 11:50 am

@ Ash

I'm afraid I'm not really following you now. I suppose "top" is a subjective term.

Returning back to your original question regarding "correct" (in an impossibly ideal world) phonetics, I'm essentially creating something similar to Todo Akiyasu's "Kanji Gogen Jiten" but now half a century on employing more sophisticated Old Chinese reconstructions (largely Pulleyblank-inspired and crucially not exclusively reliant on surface rhyming tendencies) and better palaeographic analyses, along with comparative semantics to counter any charges of etymological speculation. Unfortunately, given the rate at which I am able to work on it, it won't be appearing any time soon… sigh…
Jonathan Smith said,

June 22, 2018 @ 1:02 pm

Just above Chris Button refers to a paleographical alternation 求~杀; this of course need not imply a common origin, which would constitute an original claim by Button so far as I know (?).

I chimed in because, @Chris Button, I perceived that your propensity for the "XYZ無疑"-style pronouncement (e.g., "[t]hat something is the graphic ancestor of the unrelated words represented by 求 and 杀") as opposed to explicitness regarding what is objective fact vs. past suggestion vs. novel proposal had led to a still-unresolved confusion. It would be awesome in my view if you are able to incline more towards the latter in your dictionary, as I for one am looking forward to trying to learn from it.
Chris Button said,

June 22, 2018 @ 2:17 pm

@ Jonathan Smith

There may very well have been two separate graphs for 求 and 㣇 / 杀 that merged since we do have evidence of such mergers in the script (e.g. 夷 with 人 in late OB inscriptions to give just one). However, until someone manages to disentangle 求 from 㣇 / 杀 in that manner, then we can only go with the evidence we have. The description in my dictionary (which is as much designed for lay use as academic so cannot possibly enter into too much discussion) currently begins accordingly at the top of the 求 series:

Fur [OBgraph] as clothing [OBgraph] (衣), later modified to 裘 by [OBgraph] (杀 deformed to 求) and abbreviated in derivatives to 求…

In that regard, what could be called novel in my approach is that I do not treat the palaeograph in question as exclusively 杀/㣇, or exclusively 求, or a polyphonic 杀/㣇 ~ 求 which have all been suggested over the years (among other things). Instead I am suggesting that the graph is the original ancestor of 杀/㣇 but became deformed to 求 only as an additional component of "furry 衣" (ultimately giving 裘) before "furry 衣" was dropped in later derived forms to leave just 求. Having said that, there are nonetheless cases in the inscriptions where "seek" does give a nice translation…

As for objective fact, unfortunately that doesn't play much of a role in translating OBI or reconstructing OC.
Ash said,

June 22, 2018 @ 3:33 pm

@Chris Button I wish you well in your research!
Jonathan Smith said,

June 22, 2018 @ 5:22 pm

@Chris Button Thanks for laying out your approach — press on. I agree at least that there is little to recommend the traditional understandings here… they're simply what's "on record" for the moment.
Chris Button said,

June 23, 2018 @ 8:42 am

@ Ash & Jonathan Smith

Thanks for the words of encouragement and the good discussion!

RSS feed for comments on this post

Character crises

60 Comments

SO said,

Alyssa said,

Victor Mair said,

Victor Mair said,

Alyssa said,

SO said,

Victor Mair said,

TM Jones said,

Daniel Barkalow said,

Michael Watts said,

Michael Watts said,

Jim Breen said,

Victor Mair said,

Bathrobe said,

Victor Mair said,

cliff arroyo said,

Philip Taylor said,

Alex said,

Alex said,

~flow said,

~flow said,

PeterL said,

Scott P. said,

~flow said,

Jim Breen said,

David Moser said,

~flow said,

David Marjanović said,

~flow said,

Victor Mair said,

Victor Mair said,

stephenl said,

Michael Watts said,

loonquawl said,

Ash said,

Victor Mair said,

Bathrobe said,

SO said,

Chris Button said,

Victor Mair said,

Ash said,

Bathrobe said,

Chris Button said,

Ash said,

Onelegkick said,

Chris Button said,

Ash said,

Eidolon said,

Chris Button said,

Jonathan Smith said,

Jonathan Smith said,

Ash said,

Chris Button said,

Ash said,

Chris Button said,

Jonathan Smith said,

Chris Button said,

Ash said,

Jonathan Smith said,

Chris Button said,

Follow us on Twitter

Archives [+/–]

Blogroll [+/–]

Meta