We have recently encountered an "Epic Dictionary Fail". Today, I should like to consider what happens when a script fails.
The following signs are affixed to a dentist's office in Taiwan. The one in vertical orientation reads
Shìmín yáyī 世旻牙醫 (Shimin Dentist):
The sign in horizontal orientation reads
Shìmín yáyī zhěnsuǒ 世旻牙醫診所 (Shimin Dental Clinic)
(Underneath are listed the services provided by the clinic)
The person in charge of the clinic is Lǐ Pèiyǐng 李佩穎, and the office is located at Dà'ān District, Fùxīng South Road, Section 2, Alley 151, No. 7 in Taibei 台北市大安區復興南路二段151巷7號.
What attracted me to the signs on this clinic are the little marks next to the character mín 旻 ("sky; autumn"). These are informally called "bopomofo" after the first four sounds of this phonetic system (a semisyllabary plus tonal diacritics), which is somewhat more formally referred to as Zhùyīn fúhào 注音符號 (Phonetic Symbols). More complete and exact designations for the system in Chinese and English are given here.
While such symbols are by no means ubiquitous in Taiwan, one encounters them often enough that they begin to have an impact. One wonders why the Chinese writing system so often needs a separate script for phonetic annotation.
The reason seems to be that readers often have little or no idea how to pronounce particular characters. The owner of this business, for whatever reason, wants to use the character 旻 in the name of their clinic. At the same time, they are keenly aware that many, if not most, potential patients do not know how to pronounce the character 旻 (and probably don't know the meaning either). Hence, they have resorted to Zhùyīn fúhào (bopomofo) to make the pronunciation clear.
I find this situation to be extremely thought-provoking. After all, Chinese characters are supposed to be able to represent speech, so it seems odd that users need to resort to a separate writing system (bopomofo) to clarify how a word should be pronounced. I refer to this as the "furiganaization" of the Chinese script, and have written about it in "Pinyin as Furigana" (3/23/2011) and elsewhere. Furigana are the ruby symbols used in Japanese for the phonetic annotation of texts for learners, and also often for low frequency words, proper names, and unusual pronunciations of kanji (Chinese characters).
As a learner of Chinese in Taiwan four decades ago, I was deeply grateful for the existence of extensive reading materials at all levels that were phonetically annotated with bopomofo. That saved me endless hours of dictionary drudgery and frustration. There was (and still is) an excellent newspaper called Guóyǔ rìbào 國語日報 (Mandarin Daily) that had (and still has) all of its articles phonetically annotated with bopomofo. There are also the wonderful series of literary supplements (periodically collected into books) entitled Gǔjīn wénxuǎn 古今文選 (Anthology of Ancient and Modern Texts) and Shū hàn rén 書和人 (Books and People). Moreover, bookstores in Taiwan are stocked with hundreds of premodern texts, both classical and literary, that are not only annotated with bopomofo, but accompanied by translations into Mandarin and extensive commentaries and notes to assist the reader. My favorite series of this sort is published by Sānmín shūjú 三民書局 (with aquamarine covers and white lettering). Táiwān yìn shūguǎn 臺灣印書館 (the Jīn zhù jīn yì 今註今譯 [modern notes and translations] series, with dark blue pseudo-threadbound covers and white lettering) has excellent commentaries and translations, but unfortunately no phonetic annotations. These two series are of very high quality and are prepared by the top scholars in Taiwan. There are many other similar series in Taiwan, not a few with complete phonetic annotations, which shows how seriously people there take reading and understanding classical and literary texts. Presently, I do not know of anything comparable in the whole of China. There are some Mandarin translations of classical and literary texts on the mainland, but none that have phonetic annotations throughout.
Time and again, I have urged the educational and cultural authorities in China to use Pinyin for the same purposes, although they have been very reluctant to do so, partially because of technical difficulties of setting the ruby symbols in an orthographically correct and esthetically pleasing manner. Now that these technical problems are being solved by computer programmers and software experts, I expect that we will see more and more instances of Pinyin being used as phonetic annotation in the PRC. This is particularly the case because character amnesia is definitely on the rise.
In the thirty-five or so years that I have been teaching Mandarin and Literary Sinitic (Classical Chinese) to hundreds of native and non-native speakers, I have encountered (among others) the following categories of failure on the part of students to read and write the characters:
a. has nary a clue as to the meaning and pronunciation of the character — draws a complete blank
b. has only a vague idea about the meaning and / or pronunciation of the character
c. can guess the rough, probable meaning of the character from context, but cannot pronounce it at all or can only make a stab at the pronunciation (often missing completely)
a. knows more or less well the pronunciation and meaning of the character he / she wishes to write, but cannot put down even the first stroke for it
b. knows more or less well the pronunciation and meaning of the character he / she wishes to write, but can only sketch out parts of it without being able to complete the whole character is such a fashion that it would be recognized by others
c. knows more or less well the pronunciation and meaning of the character he / she wishes to write, and can almost write the entire character, but makes one or more errors, some of which are capable of causing the character to be misread or not / barely recognized by others
All of these types of errors are of a very different nature from those that are encountered in languages that use alphabets as their writing systems, since the reader of a text written with an alphabet can — with rare exceptions like the infamous made-up word "ghoti" — more or less accurately sound out the words with which they are confronted, and the writer who uses an alphabet to write a text can always approximate the sounds of the words he / she has in mind. Even though he or she may misspell some of the words more or less badly, the reader can usually make out what he / she intended.
The good news is that the phonetic annotation of Chinese texts, dictionaries, and online or software resources, whether relying on pinyin or bopomofo, can help substantially to alleviate all of these types of errors. Consequently, as someone who is committed to making Chinese languages better known throughout the world, I am a strong advocate of the widespread utilization of phonetic annotation of Chinese characters.
Those of you who are not interested in dry statistics can stop here. What follows (below the double rule) is merely my attempt to determine the frequency of mín 旻 among all characters in the Chinese writing system. I shall close this section by noting merely that structurally mín 旻 is actually a relatively simple character, with only 8 strokes in total. The average number of strokes for the top 10,000 characters is around 12. The frequency-weighted average number of strokes:
* For the most frequently used 2,965 characters: 9.10;
* For the most frequently used 1,253 characters: 8.91;
* For the most frequently used 733 characters: 8.65.
These statistics are drawn from Chih-Hao Tsai's [sic] "Frequency and Stroke Counts of Chinese Characters."
Mín 旻 consists of two simple components, rì 日 ("sun; day") and wén 文 ("pattern; character; script; writing; culture"), either one of which might serve as a semantic classifier ("radical") in various characters, but in mín 旻 it is rì 日 that is the semantic classifier and wén 文 that is the phonophore (N.B.: Cantonese man4 man6; Taiwanese bun [in a disyllabic word] or buun [by itself]).
Judging from the simple construction and medium frequency of mín 旻 among the top 20,000 or so characters, it is apparent that even literate Chinese do not know the pronunciation (much less the meaning) of the overwhelming majority of Chinese characters, which generally have more strokes and more components than mín 旻, and are much less frequent than mín 旻 as well. Indeed, I have had highly educated Chinese friends and colleagues tell me that they are uncertain of the pronunciation, meaning, and how to write many characters in Xīnhuá zìdiǎn 新華字典 (Xinhua Character Dictionary), the standard single character dictionary (see next paragraph below) for the PRC, which has sold well over 300,000,000 copies. In short, simple and innocent though it may appear, mín 旻 — together with the three phonetic symbols (ㄇ ㄧ ㄣ) and the 2nd tone diacritical ´ next to the ㄧalongside it — speaks volumes about the nature of the Chinese writing system and the challenges it poses for its users.
Mín 旻 does not occur among 9,933 characters on this list, but it does occur in the Xīnhuá zìdiǎn 新華字典 (Xinhua Character Dictionary), my favorite portable character dictionary (zìdiǎn 字典), which has a little over 10,000 characters.
Mín 旻 occurs about 1/4 of the way down on this list of 13,060 characters, and is cited as occurring 860 times in a data base of 171,882,493 total characters, which is about 5 per million characters.
In the Linguistic Data Consortium's Chinese news corpus, it occurs 1,088 times in 605 documents, in a collection containing 1,368,064,442 characters in 3,087,084 documents, for a frequency of about 0.8 per million characters (where punctuation and some formatting characters are included in the total).
Finally, the Hànyǔ Shuǐpíng Kǎoshì 汉语水平考试 (HSK / Chinese Proficiency Test) Vocabulary Guideline, 5th printing Beijing, 2008), which contains 8,000 characters, has completely dropped mín 旻. The HSK is based on the Xīnhuá zìdiǎn (see above) and is a very important standard for character usage, especially for learners of Hànyǔ pǔtōnghuà 汉语普通话 (Modern Standard Mandarin), but it has phased out 2,000 characters that are considered unnecessary to learn.
In the data bases that I have consulted, some of the discrepancies may have to do with different ways of counting characters (e.g., does punctuation get counted or not?), and some are no doubt due to genre or subject matter. Mín 旻 is far more likely to occur in "bookish" corpora than in general or news corpora.
Some words in English that are roughly comparable in terms of frequency, with their frequencies (per million words) in different sections of COCA, are:
Conclusion: Among the top 8,000-10,000 characters (which cover the vast majority of instances in most contemporary texts), mín 旻 is of low frequency. Among the top 20,000 or so characters (which cover nearly all instances in most modern texts), mín 旻 is of medium frequency. Among the top 60,000-80,000 characters (which cover virtually all [but not quite all] instances in the totality of premodern and modern texts), mín 旻 is of moderately high frequency.
[Thanks are due to Peter Leimbigler and Mark Liberman for statistical data, to Melvin Lee and Sophie Wei for Taiwanese pronunciation, and to Neil Kubler for sending me the photographs]