How many more Chinese characters are needed?

« previous post | next post »

I was stunned when I read this op-ed piece in the NYT yesterday (10/24/16):  "China's Digital Soft Power Play".  In it, the author, Jing Tsu (a professor of Chinese literature and culture at Yale), writes:

This month, the Chinese government plans to introduce codes for some 3,000 Chinese characters as part of a grand project, known as the China Font Bank, to digitize 500,000 characters previously unavailable in electronic form. Until now, only 80,388 characters have been encoded in the international computing standard, Unicode.

The project highlights 100,000 characters from the country's 56 ethnic minorities, and another 100,000 rare and ancient characters from China's written corpus. Deploying almost 30 companies, institutions and universities, it's the largest state-funded digitization project ever undertaken.

And here's her conclusion:

As the state's online monitoring apparatus has grown in recent years, netizens have found ways to take jabs at the government through wordplay, the use of mutated or ancient characters, and nonstandardized electronic scripts developed in places like Taiwan. The Font Bank project will standardize the language, and as the scripts for secret usage enter into an official database, subversive language will be more easily detected. The newly digitized characters will help China to better track people's movements, finances, and public and private speech.

But the project will do so much more. Putting the largest vocabulary online has been described as "sailing out on a borrowed ship" — a strategy that makes use of other countries' networks, infrastructure and resources to take China's agenda global. Adding a half million more characters may not be what the Jesuits prayed for, but it marks a new form of smart power for a nation still on the rise.

A colleague wrote to me, asking:

Does her conclusion seem odd to you? Increased ability to censor, right; increased global soft power, wha?

I wrote to Jing Tsu asking for clarification, saying:

It's hard for me to imagine where the Chinese authorities are going to dig up 500,000 total characters, let alone 100,000 more Chinese characters.  Have you seen their lists?

She replied:

Agreed — I think it's going to be a lot of re-standardization of how many uncommon characters are written so that the Chinese government can unify the digital lexicon and close the loop for local adaptations.

I also showed the op-ed to two specialists on the Chinese script, Matt Anderson, a paleographer, and David Moser, author of A Billion Voices: China's Search for a Common Language.

Matt Anderson:

There is no way that there are 100,000 rare hanzi that do not appear in unicode (though of course there are more than a few), not to mention whatever they mean by the 500,000 number (how could there be 400,000 more unencoded graphs, even including every script ever used by every people who ever lived within the borders of China, however defined?).

For one thing, the article emphasizes the troubles faced by Chinese citizens who have unusual characters in their names. These problems are real, but this project won't help them—maybe there are a few people whose names contain characters not in unicode, but the problem is generally that the characters are contained in unicode but don't render properly on government computers. So all that is needed (if they are to continue to use those names, and if people know how to enter them, which is an entirely separate problem) is for them to install a comprehensive unicode font.

The article gives the example of yǎn 龑 (invented for himself by a Chinese emperor over a thousand years ago, with the auspicious meaning of "dragon [flying in] heaven"), which isn't very common, but which is in the basic unicode block, not even one of the extensions. I think the problem with that one probably wasn't even font-related; it was probably just that the examiners didn't know how to read it or how to find its pronunciation to enter it (or maybe they wanted to enter 龙+天, but that character has never been officially simplified, so that problem might have to be solved by reconvening the Wenzi Gaige Weiyuanhui (Script Reform Committee) and having them officially endorse a simplified form).

And I think the huge numbers must come from a confusion of fonts and graphs (either that, or they must be creating separate "graphs" for characters that are written very slightly differently than the "standard form"). Either of those could easily result in such huge numbers of graphs, if each character that was ever written in a particular script style is encoded separately in each script style, and in many variants. If done well, this might be useful for palaeographers, but it could also easily turn into a nightmare. I don't see how it could be of use to anyone else, though.

That said, I'd be personally happy to see the addition of two or three thousand more rare ancient graphs, as that would make my work easier. But it would have little effect outside of palaeography. And the idea of 500,000 (or even 100,000) more characters is just ridiculous.

David Moser:

Two sentences in the article stuck out for me:

"The online expansion will give people in China and around the world more access to the script, thereby helping spread the Chinese language and culture."

How is adding 100,000 new characters going to help spread Chinese language and culture?  Only the tiniest group of experts will even be interested in these obscure characters.  And if 80,000 characters is not enough to get American students interested in learning Chinese language and culture, I can't imagine 100,000 more being much a magnetic attraction for them.  It will be cool to have seal characters and jiaguwen (oracle bone inscriptional forms) digitized, sure, what fun.  But it's not going to increase China's soft power.

"Anything from scholarly papers to tweets will help extend the reach of Chinese through its sheer availability. As more of the language enters cyberspace, more people will use it, and its status will rise with its visibility."

"Extend the reach of Chinese through its sheer availability. As more of the language enters cyberspace…"  This is the old mistake of confusing language with script.  The way to "elevate the status of Chinese" is to make the language itself more important, more relevant, and easier to teach and acquire. Additional digitization of the historic character set will only make it easier for sinologists and experts to insert these into their scholarly articles. Which is quite cool. But this is a remote consideration for 99.999% of actual Chinese users.  Adding characters will not help in tweeting, emails, website design, e-commerce, or anything else.  If anything, it will just encourage people to name their kids and companies and websites with characters that no one has ever seen and nobody can write.  Just what we need: more character ambiguity and confusion.

Like saying: "If we add more rules to the tax code, it will make it easier for people to file their income tax."

What can I say?  After a century and more of efforts by script reformers to tame the unruly Chinese writing system, it now seems that some folks in the current Chinese government, for whatever misguided reason, have decided to open a veritable sinographic Pandora's box.  I cannot imagine that they will meet with much success, though I can believe that they are asking for troublous confusion.



52 Comments

  1. Jenny Chu said,

    October 25, 2016 @ 7:32 pm

    Just a thought… Is it possible that the number 500,000 was lost in translation? Maybe it is really 5,000? I encounter on a daily basis the difficulties of rapid translation of Chinese numbers. ("No, wait, I meant one thousand hundred. Wait, no!… ")

  2. Jim Breen said,

    October 25, 2016 @ 7:39 pm

    And to think that Unicode V1 had around 21,000 CJK characters, and we thought that was a lot.

    There are a few things about Jing Tsu's article that trouble me:

    >> The Font Bank project will standardize the language…

    It won't. This concerns orthography; not language. It will (possibly) standardize the coding of very rare hanzi, but nothing more than that. The vast majority of Chinese people will never have used them.

    >> Putting the largest vocabulary online..

    See above. These rare hanzi are not "vocabulary". They won't change the Chinese lexicon.

    >> as the scripts for secret usage enter into an official database, subversive language will be more easily detected.

    How? If people have been able to use "secret" and "subversive" writing without the rare hanzi being in Unicode, I can't see how adding them to Unicode makes a jot of difference. Surely they can go on doing what they did before (whatever that was.)

    I quite agree with the comments by Matt and David. I can understand moves to add rare hanzi to Unicode – it's been going on for years – but the spin and interpretation being put on it seems untenable.

    >>

  3. flow said,

    October 26, 2016 @ 2:56 am

    "This month, the Chinese government plans to introduce codes for some 3,000 Chinese characters as part of a grand project, known as the China Font Bank, to digitize 500,000 characters previously unavailable in electronic form. Until now, only 80,388 characters have been encoded in the international computing standard, Unicode.

    The project highlights 100,000 characters from the country's 56 ethnic minorities, and another 100,000 rare and ancient characters from China's written corpus."

    Any chance that that big number—500,000—refers to corpus size, *not* individual characters? The Chinese CIP (impressum / imprint / colophon) always includes not only page count, but also a character count. For example, in the 10th ed. of the 新华字典 we find the (surprising!) entry "字数660千", "character count: 660 thousand", so that gives an idea of how much a half million characters look like. 3,000 would then be the number of hitherto unencoded glyphs found in that corpus. How that squares with the remarks on the 'highlighting' of 100,000 minority characters and another 100,000 Han characters remains anybody's guess, of course.

    OTOH the writer does seem to suffer from a certain amount of confusion between writing and language. What's worse, not a single source is quoted, and the news is not of an event, but of plans for the future. One should certainly think that a (civil) project (for the promulgation of communication) by the Chinese government that involves "almost 30 companies, institutions and universities" and is heralded as "the largest state-funded digitization project ever undertaken" and that is, reportedly, scheduled to release its results "this month" would somehow cast its shadows, but at least googling for "China Font Bank" yields nothing but references for, you guessed it, fonts as used by financial institutions.

    That said, the purported English name of the project seems to be… strange, I mean, "font bank"? I could imagine that to have come from "字体数据库", roughly "glyph shape database".

  4. Endymion Wilkinson said,

    October 26, 2016 @ 3:16 am

    Like you Victor I was astonished when I read the NYT op-ed piece. It sounded like a poor summary of an official handout with some weird afterthoughts about spreading Chinese influence added on. Baidu (http://baike.baidu.com/view/8326459.htm) has a much better description of what it is all about: the project (中华字库项目) was launched in its preparatory phase in 2006 and then in its full phase in 2011. It aims to lay down a more solid basis for the study of Chinese characters by encoding about 100,000 ancient Chinese scripts characters; 300,000 kaiti forms; and 100,000 minority scripts characters to give a grand total of 500,000. The project is divided into 23 packages. Judging from these the project (like so many others in recent years) will bring together already existing results (e.g. in encoding oracle-bone script; bronze characters, grass script characters, characters found on manuscripts, early printed works, etc. etc. plus current practice, e.g. unusual characters used in family and given names) and placing all of these on a single platform. Just quite how the figures were arrived it is not explained. But if you look at (for example) a good dictionary of oracle-bone script it will list 4000 plus characters and often dozens of variant ways in which they were written. I suppose it is possible if all the variants were listed 100,000 for ancient scripts is not unreasonable. But with the advent of printing variation declined. So just how the figure of 300,000 was arrived it beats me.

  5. Bathrobe said,

    October 26, 2016 @ 4:27 am

    Well, the Zhuang script is built on the same principles as Chinese characters, so there must be a few thousand characters there. And if you include the script of the Jing minority, that's a few thousand more… oh, wait, that's called "Chu Nom". But despite borrowing someone else's boat (a bit of 'ancient wisdom' that sounds more sneaky than glorious), the Chinese love to seal their minorities off within the borders of China. Is it possible they will re-encode Chu Nom as something belonging to the "Zhonghua Minzu", totally separate from what the Vietnamese have? It sounds ridiculous, but it truly doesn't seem beyond the bounds of possibility.

    The problem of people whose names aren't encoded electronically, which the article plays up so lovingly, might be tragic for Han Chinese who love their rare characters, but members of minority ethnic groups who have names longer than the usual maximum of four or so characters for Han Chinese face their own problems with computer systems, usually resolved by lopping off the surplus characters no matter how ridiculous the result. This could be fixed a lot more easily than swamping Unicode with another 500,000 characters, but I don't see the Chinese government doing anything about it. After all, it's not national pride that's at stake.

    Incidentally, I'm just wondering if encoding every variant glyph will extend to characters like 真, which the Japanese have had to put up with in its Chinese graphic form since it was decided that variants (such as those used by the Japanese) should not be included in Unicode. Now that the issue is ancient national culture, all that can be discarded. Who cares about silly little Japanese variants when all kinds of obscure characters from China's 5,000 years of history need to be encoded?

    It's a pity the article doesn't refer to empire-building, because that's what it sounds like. Like Ah-Q, China wants to show the world how great it is by appealing to the weight of history.

  6. Jim Breen said,

    October 26, 2016 @ 5:16 am

    @Bathrobe – "… 真, which the Japanese have had to put up with in its Chinese graphic form since it was decided that variants (such as those used by the Japanese) should not be included in Unicode. Now that the issue is ancient national culture, all that can be discarded. Who cares about silly little Japanese".

    I really question that the Japanese "have had to put up with" the Chinese form of the glyphs for 真, 写, etc. because there are established Unicode fonts using Japanese glyphs. Unicode has never mandated a particular glyph for any character. The publication standard for Unicode is to use the Chinese glyph form where there is a difference, but areas using hanzi/kanji/hanja are expected to adopt the ones suitable for them. To quote the Unicode 5.0 document (the latest printed version I have to hand) "It is assumed that most Unicode implementations will provide users with the ability to select the font (or mixture of fonts) that is most appropriate for a given locale."

    In the early days of Unicode there was a degree of misunderstanding in Japan about the glyphs, with some people saying that Unicode was "wrong" because some of the kanji looked "Chinese". It was partly to combat this view that the Japanese equivalent of Unicode/ISO 10646 (JIS X 0221), first released in 1995, included side-by-side sample glyphs illustrating typical Chinese/Japanese/Korean forms of the characters.

    The 真 in your comment appears in the preferred Japanese form on my screen, because my system is set to a Japanese locale, and hence is using a font containing Japanese-style glyphs.

  7. Bathrobe said,

    October 26, 2016 @ 5:48 am

    Thanks! This is obviously a fault (or something) with my system (Mac). Even after changing my locale to Japan I get the Chinese version. I really would like simplified Chinese characters in my simplified Chinese, traditional Chinese characters in my traditional Chinese, Japanese characters in my Japanese, etc., no matter what the locale.

    At any rate, I'm glad to be set right on the issue: Japanese are not being forced to see Chinese versions of characters. On the other hand, what is going to happen if they try to encode all those variant characters? Will the author decide what shape appears on the screen, will it default to the 'standard' (for that locale), or will the user be able to choose? Given that I can't even choose the Japanese form of 真 on my own computer, I don't think the issue can be unilaterally resolved through a Chinese state-run project — which is part of my point. As long as Chinese national projects are tied up with Chinese nationalism and Han chauvinism (which all too often seems to be the case), I will continue to be strongly sceptical.

  8. Victor Mair said,

    October 26, 2016 @ 8:56 am

    Even when I want to write just the phonophore (in this case the right side) of 们* (Unicode U+95E8), the high frequency PRC simplified form of mén 門 ("door; gate; entrance"), I almost always end up with 门, no matter how hard I try, no matter what convoluted means I use, and no matter where on planet earth I type it.

    [*Never you mind that -men 们 is the plural suffix for pronouns, some animate nouns, and certain personifications. For the purposes of this discussion, I'm only using it as a workaround, by referring to the part on the right side to show what the PRC simplified form of mén 門 ("door; gate; entrance") looks like.]

    As for what exactly 门 is, I'm not entirely certain. I used to assume that it was the official Japanese simplified form of the character, but now that I start looking into it a bit, I'm not so sure. In Wiktionary, 门 is referred to as a ryakuji 略字 (a colloquial, simplified form of a kanji); in Korean that would be yakja 약자. In Japanese, apparently 门 may also be referred to as a zokuji 俗字 ("popular character"), goji ‎誤字 ‎("erroneous / mistaken character"), or a daiyōji 代用字 ‎("substitute character").

    Incidentally, ryakuji 略字 can also be used in Japanese to refer to abbreviations and acronyms.

    To see what we're up against, 门 and the right side of 们 are also what is known in Japanese as hyōgaiji 表外字 or hyōgai kanji 表外漢字 ("characters outside the tables / charts"). According to Wikipedia, these are:

    =====

    Japanese kanji outside the two major lists of Jōyō, which are taught in primary and secondary school, and Jinmeiyō, which are additional kanji that officially are allowed for use in personal names.

    Because hyōgaiji is a catch-all category for "all unlisted kanji", there is no comprehensive list, nor is there a definitive count of the hyōgaiji. The highest level of the Kanji kentei (test of kanji aptitude) tests approximately 6,000 characters, of which 3,000 are hyōgaiji, while in principle any traditional Chinese character or newly coined variant may be used as hyōgaiji; the traditional dictionaries the Kangxi Dictionary and the 20th century Dai Kan-Wa jiten contain about 47,000 and 50,000 characters, respectively, of which over 40,000 would be classed as hyōgaiji or non-standard variants if used in Japanese.

    =====

    Judging from my own experience in Japan and the Japanese Wikipedia article on ryakuji 略字, even though they are "outside the official tables / charts"), their use in Japan is very widespread.

    The mystery deepens, the plot thickens.

    门 is not even in jisho or tangorin, those two wonderful online Japanese dictionaries.

    Gosh! 门 doesn't seem to be "standard" in anybody's "system", but it's all I can get in Google, in my Apple operating system, in my internet browser, etc., etc. Maybe this is what Professor Tsu is referring to when she talks about "clos[ing] the loop for local adaptations". If that were to happen, where would it leave Japanese autonomy?

    ———–

    NOTE:

    The Zhuang are China's largest minority ethnic group. They have a writing system called Sawndip which uses the traditional form of 門 and there is also a simplified Sawndip that uses the PRC simplified form of the character (right side of 们) for the morpheme that in Romanized Zhuang is written as "mwngz".

  9. Bathrobe said,

    October 26, 2016 @ 9:31 am

    I have the same problem with 门. That is, browsers show a horrible form that just has a single short stroke in the middle, rather than the correct form (on the right side of 们) with the short stroke over at top left. 们 comes up fine; the problem is confined to 门. It must be an Apple thing. It is very frustrating not to have proper control over your own computer. The correct character form (downstroke on the left) comes up in Microsoft Word.

    门 is the Japanese abbreviated form which I first used when I came to China until it was pointed out to me that it was "wrong".

  10. J. M. Unger said,

    October 26, 2016 @ 10:00 am

    Seems to me that the first paragraph of the "conclusion" lets the cat out of the bag. Makes me proud of the First Amendment and dissenters everywhere who fearlessly use just an alphabet.

  11. January First-of-May said,

    October 26, 2016 @ 10:34 am

    I had to look at the Wiktionary article to figure out what's wrong with your depiction of the character 门 – to me (Windows 7, Chrome, Russia) it looks exactly like the (slightly expanded) right part of 们.

  12. Eric said,

    October 26, 2016 @ 11:58 am

    FWIW- my computer has always displayed Chinese 门s properly (the same as the right side of 们). I use a mac with Chrome, Safari, Firefox–all display just fine.

  13. Jonathan Smith said,

    October 26, 2016 @ 3:10 pm

    Truly bizarre. Chinese soft power? This article. Looking at you, NYT. Looking at you, Yale.

    Best case scenario, I am wrong and we are only dealing with someone rather obviously unqualified to write a book entitled "The Kingdom of Characters: Language Wars and China's Rise to Global Power." In which case, still looking at the above institutions, but less surprised.

  14. Linda G said,

    October 26, 2016 @ 3:38 pm

    From an anonymous correspondent in Taiwan:
    The article is galling.

    Such nonsense! And how are they supposed to decide what's standard, esp. when it comes to things like oracle bones?

    David's comment about the tax code is spot on.

    Also, did you notice that what has come out of Taiwan was declared to be "nonstandardized" but what will come out of China will of course be the proper way to do things — as if the land of jiantizi must be the authority? An odd position for someone who seems to be from a Taiwanese background. And this is supposed to *help* people?

    Someone must have already coined the term "stupid power." It's surely needed here to contrast with Jing Tsu's assertion of half a million more Hanzi being "smart power."

  15. AntC said,

    October 26, 2016 @ 4:45 pm

    @Linda G, I agree with "stupid power".

    As the state's online monitoring apparatus has grown in recent years, netizens have found ways to take jabs at the government through wordplay, the use of mutated or ancient characters, and nonstandardized electronic scripts developed in places like Taiwan. …, subversive language will be more easily detected.

    Netizens will continue to find ways to take jabs at government; they'll just find different oblique/nonstandardised ways to do it.

    Does it not occur to the authorities that suppressing or 'monitoring' dissent does nothing to address the causes for dissent — in fact it provides extra grounds for dissent?

  16. J.W. Brewer said,

    October 26, 2016 @ 6:48 pm

    I see that according to Prof. Tsu's website: 'Her second book, Sound and Script in Chinese Diaspora (Harvard University Press 2010), has been called "a truly groundbreaking work in Sinophone studies," "an unusual, complex, and remarkable book," "a captivating work of linguistic and literary scholarship," and a "must-read."'

    Perhaps the book has also been called less complimentary things by others whom its author understandably chose not to quote, but it does sound like it addresses topics that would be of interest to people who find the topics of Prof. Mair's posts interesting.

  17. Eidolon said,

    October 26, 2016 @ 6:48 pm

    "As long as Chinese national projects are tied up with Chinese nationalism and Han chauvinism (which all too often seems to be the case), I will continue to be strongly sceptical."

    I'm not sure what you found especially Chinese nationalistic or Han chauvinistic about the project, since most of your examples consist of pet peeves unrelated to the project itself. I would think that an effort to encode hundreds of thousands of ethnic minority characters, which up to now have no unicode representation, would be considered more multi-culturalist than Han chauvinist, but perhaps I'm not looking at it with the same bias in mind. And while scripts such as the oracle bone script & the Zhou bronze script do not *need* to be encoded in unicode, for scholars & specialists it sure would be nice to have them that way, so as to avoid having to rely on images whenever the need to write them arises. Besides which, a country wanting to carve out a digital space for its historical heritage is not fundamentally nationalist – a parallel example would be the recent effort to encode the Mayan in unicode, which no one has ever argued is "nationalist" or "Native American chauvinist."

    The article was terrible, I agree, but that's a fault of the writer more than that of the project. There is little chance that the Font Bank Project will actually make it easier to learn either the Mandarin language or the Chinese writing system, and it won't help with standardizing the vocabulary. But I can see it fostering more curiosity about the 500,000 characters China has just added, and making it easier for people who want to learn more about these ancient and/or minority scripts to write about them, and *in* them. It is indeed unfortunate that the article has decided to focus on the former rather than the latter, as the first thought that came into my mind was not "this will help people learn contemporary Chinese" but rather "this will help people learn and write in scripts *other* than contemporary Chinese."

  18. Christopher Henrich said,

    October 26, 2016 @ 6:52 pm

    Is it at all possible that this article originates in a jape? I am wondering if a satirical fancy, like an article in The Onion, has been picked up and taken at face value by somebody along the line. That has happened, more than once, with pieces from The Onion itself.

    As a project for aggrandizing China's "soft power," this one seems to me to be utterly crazy. I am persuaded, by the Language Log posts of Victor Mair (among other things), that the multitude of "Chinese characters" is a bothersome handicap for speakers of Chinese. I think that, in China before the revolution of 1912, it helped concentrate power in a small elite: only those few who could find the time to study the Chinese script could hope to acquire any sort of social power. I gather that some speakers of Chinese are beginning to regard the traditional writing system as "a bug, not a feature" of Chinese culture, and are sneakily using Pinyin or other alphabetical encodings. The last thing they can want is a huge heap of new characters.

    Today, China is still ruled by a small elite, whose position may be becoming a bit precarious. (At least, my own American sentiments encourage me to think so.) I can imagine a member of the Chinese establishment, worried about the future, feeling that this project would give China a "yuge" part of the Unicode codespace and thus magically restore the Good Old Days.

    But we can hope that the real effect will be to give more minor languages a place in Unicode; I think that would be a constructive action.

  19. J.W. Brewer said,

    October 26, 2016 @ 7:10 pm

    Re "suffer[ing] from a certain confusion between writing and language," one should perhaps have some sympathy for the lady. Why would one expect her not to be confused? I would imagine that her undergraduate teachers at UC-Berkeley, the members of her dissertation committee at Harvard, and the members of the committee that recommended tenure for her at Yale themselves mostly believed and believe various silly and false things about language that would have been dispelled had they ever taken a single decent intro linguistics course for undergrads and paid attention. But they mostly never took such a class at any point in their education, because why would they? Why should you need to know anything about linguistics to be a tenured academic in a field involving language?

  20. Eidolon said,

    October 26, 2016 @ 7:10 pm

    "I gather that some speakers of Chinese are beginning to regard the traditional writing system as 'a bug, not a feature' of Chinese culture, and are sneakily using Pinyin or other alphabetical encodings."

    There are two sides to every story. It should be remembered that for all its faults, the Chinese writing system also had important positives – for example its tolerance of variant phonetics, such that the same character can represent two vastly different pronunciations, without which it would have been impossible, or at least very difficult, for it to be used as a method of communication between speakers of completely different languages, such as Korean and Vietnamese, or even Cantonese and Mandarin. The Chinese state was, in some sense, held together by the flexibility of the Chinese writing system, as an early conversion to an alphabetic writing system would've quickly resulted in mutual incomprehension between the literate elite in different regions of China, and produced either the subsequent dissolution of the state, or the much more difficult task of language unification.

    Indeed it is only with the rise of mass communication & education that it has become feasible for all of China to be taught the same unified language – that is, Modern Standard Mandarin, the lingua franca of the Sinitic world today. So while modern speakers of this language might indeed take issue with the inefficiencies of the old writing system, they are taking for granted the fact that most of them can, now, speak and write the same language in pinyin form. Historically, that was anything but the case, so one should be careful of the idea that it was only ever a "bug" that hanzi was not alphabetic.

  21. JK said,

    October 26, 2016 @ 7:15 pm

    "Until now, only 80,388 characters have been encoded in the international computing standard, Unicode."

    The website zdic.net (汉典) says it has 75,983 Chinese characters, including the entire Kangxi Zidian and lots of variant characters (异体字), all of which appear to already be in unicode. I don't think I have ever come across a character I cannot find on that website, so I agree it is hard to imagine what those 500,000 characters could be.

    As for massive state-funded digitization projects, the complete MSM translation of the 12 dynastic histories was a massive project that was digitized, and digital versions of the Siku Quanshu and other collections are equally massive.

  22. J.W. Brewer said,

    October 26, 2016 @ 7:22 pm

    Re unusual personal names, it is interesting if true that the dictatorial Beijing regime permits at least in theory more room for onomastic idiosyncrasy and whimsy than the democratic-if-not-particularly-individualistic government of Japan, whose attitude is "here's the official list of kanji you can choose from in naming your kids — don't try to color outside the lines." https://en.wikipedia.org/wiki/Jinmeiy%C5%8D_kanji

  23. Jonathan Smith said,

    October 26, 2016 @ 7:23 pm

    It's remarkable that a comment like J.W. Brewer's most recent is even possible — should so totally be green font, and so totally isn't.

  24. Jonathan Smith said,

    October 26, 2016 @ 7:23 pm

    Sorry, now next most recent…

  25. Victor Mair said,

    October 26, 2016 @ 8:43 pm

    From a PRC graduate student who is usually very good at finding things on the Chinese internet (with notes by VHM at the bottom):

    The issue raised is rather curious. I somehow fail to find any Chinese coverage about the "Chinese character bank". But if there really is one, I would take a rather different view than Prof.Tsu's. Enlarging the Chinese corpus probably will not strengthen the soft power; it appears rather formidable. Also, i have the impression that netizens seldom use obsolete and complicated words to jab at the government. On the contrary, it is exactly those 俗 and 白话 usages that strike the chord. A quick example would be 河蟹 in substitution for 和谐.

    I remember that my mom always writes an odd character which combines 由 on the left and 攵 on the right. She says it is 数 as in 数学, as she was taught. She writes it in this way in public and at work, and there seems no problem. I guess there might a category of characters that lies in the gray zone, still playing a mysterious part in present day life.

    —–

    NOTES:

    sú 俗 ("vulgar; popular")

    báihuà 白话 ("vernacular")

    héxiè 河蟹 ("river crab")

    héxié 和谐 ("harmonious")
    由+攵 [由攵]

    shù 数 ("number")

    shùxué 数学 ("mathematics")

  26. Jim Breen said,

    October 27, 2016 @ 12:05 am

    @Bathrobe:

    > 门 is the Japanese abbreviated form which I first used when I came to China..

    The hanzi I'm seeing looks very Chinese to me; in fact it doesn't exist in any Japanese kanji standard (http://www.unicode.org/cgi-bin/GetUnihanData.pl?codepoint=%E9%97%A8). Did you mean 門?

    Odd that you can't invoke a Japanese font on your Mac. Apple usually does these things very well (unlike Google who bungled the issue with Android so that unless you set the whole device to Japanese you get Chinese-flavoured glyphs. Due to be fixed, finally, in Android 7.)

  27. Jichang Lulu said,

    October 27, 2016 @ 2:12 am

    There is some information in Chinese about what Jing Tsu (or her unquoted source) mistranslates as 'Font Bank', the 中华字库工程. Anyone interested can just google that. Endymion Wilkinson's comment already summarises most of what I've seen online. Needless to say, Wilkinson's one paragraph and link to Baidu is much more informed and informative than the NYT piece. Based on her reply to Victor, I'd say Jing Tsu didn't do a lot of research on her topic before writing to the paper.

    What's not immediately clear to me is what motivates the timing of the op-ed. For all I know, the project is neither new nor is achieving any remarkable milestone at the moment. If, as she says, the gov't plan to "introduce codes for some 3,000 Chinese characters" this month as part of a project that reached cruise speed in '11, they still have years to go. Maybe it's the "is writing a book" part that explains the timing. Just sayin'.

    The 500k goal is mystifying. How do they know how many obscure glyphs are out there before they're done searching? If the project was just 'digitising an existing closed database', that would make sense, but then perhaps it wouldn't take so many years and institutions. The project is led by the euphoniously named SA(PP)RFT 广电总局. The 500k figure could be a reasonable estimate. Alternatively, perhaps some honcho up there came up with the 500k number and scholars will now have to live up to it (cf. the Xia-Shang chronology project).

    The "sail out on a borrowed ship" (借船出海) idiom Jing Tsu quotes has also been applied to a project not far from the Character Database (in the administrative metric), namely at China Radio International. Something I've called 'outsourcing soft power' and Reuters have called 'Beijing's covert radio network'.

  28. Bathrobe said,

    October 27, 2016 @ 3:28 am

    @ Jim Breem

    I wasn't referring to 門. I was referring to the abbreviated Japanese handwritten form that closely resembles the Chinese simplified form 门 (by which I mean the right part of 们). The Japanese handwritten abbreviation has a short stroke down the middle of the top bar; the Chinese simplified form has the stroke at the left side of the top bar. I don't suppose that the Japanese abbreviated form is encoded anywhere, but it certainly comes up in browsers (Safari, Chrome) on my computer. I am unable to summon up the standard Chinese simplified form. I have no idea why this is. Perhaps I'm looking at the wrong place to make the requisite adjustments. Perhaps I have something on the system that overrides the correct settings. But the fact remains that I am unable to get the correct form of the character 门 to display in browsers.

  29. Bathrobe said,

    October 27, 2016 @ 3:37 am

    As a further note, having switched my location to Japan and back away again, I now find that the character 直 on browsers renders in the Japanese form, even when I'm typing in Chinese! Obviously more tinkering and restarts are needed, but I'm definitely finding the whole thing mysterious.

  30. Jim Breen said,

    October 27, 2016 @ 3:58 am

    I think you mean this character: https://upload.wikimedia.org/wikipedia/ja/0/09/RYAKUJI_2-0000.gif Yes, it's the handwritten abbreviation of 門. The variant glyphs of characters, in this case a handwritten abbreviated form, are rarely separately coded in character sets. You wouldn't want each way of writing "f" for example (italic, copperplate, cursive, etc.) given a separate code-point. You can get 草書 font sets, but you'd never want them to be encoded separately.

  31. Bathrobe said,

    October 27, 2016 @ 4:21 am

    @Eidolon

    My own particular 'bias', as you call it, is against the Chinese regime's continuing attempts in the past few decades to replace ethnic minority languages with Han Chinese. Other particular 'biases' are against attempts at national aggrandisement based on simple aggressive nationalism, buttressed by the use of 'history' as a tool, and the co-opting of non-Han languages into this project of aggrandisement as fairly much an afterthought, despite the clear (although not publicly acknowledged) goal of reducing these languages to a state of irrelevance.

    For the record, I am not against the encoding of ethnic scripts per se; I am not opposed to Chinese characters per se, which I find fascinating; I am not opposed to the encoding of oracle-bone scripts; and I am not opposed to the language or culture of the Han Chinese.

    Now that I have stated my position (or 'bias'), please let me know yours. From your continued comments, you appear to be an unwavering supporter of the Chinese language, Chinese culture, Chinese civilisation, and the Chinese government against all comers. And your main 'peeve' seems to be with people who don't take such a charitable view of the country and culture you support. Perhaps I've judged you wrongly, but that is how you come across.

  32. Bathrobe said,

    October 27, 2016 @ 4:24 am

    @ Jim Breem

    Yes, that's the glyph! Perhaps you wouldn't want it encoded separately, but it's annoying when it makes its way into the representation of another language. Possibly it's in a font that I have installed on my computer and is being selected by the system over the correct form.

  33. Victor Mair said,

    October 27, 2016 @ 6:46 am

    @Jim Breem

    "The variant glyphs of characters, in this case a handwritten abbreviated form, are rarely separately coded in character sets. You wouldn't want each way of writing "f" for example (italic, copperplate, cursive, etc.) given a separate code-point. You can get 草書 font sets, but you'd never want them to be encoded separately."

    Precisely! That way madness lies.

    NOTE: J. sōsho / M. cǎoshū 草書 ("cursive")

  34. Rodger C said,

    October 27, 2016 @ 6:59 am

    the euphoniously named SA(PP)RFT

    Pronounced "saprophyte."

  35. Victor Mair said,

    October 27, 2016 @ 7:52 am

    @Rodger C

    "saprophyte" — brilliant!

    The Guójiā Guǎngbō Diànyǐng Diànshì Zǒngjú 国家广播电影电视总局 (State Administration of Press, Publication, Radio, Film and Television [SAPPRFT]), humorously referred to in Orwellian terms by China Digital Times as the Ministry of Truth ("Minitrue" for short) has often been referred to on Language Log, usually with regard to limiting the use of puns — e.g., "Punning banned in China" (11/29/14) and "It's not just puns that are being banned in China" (12/7/14) — but also regarding the limitation of English: "Clamp down on English" (7/7/16)

    See also this comment to an earlier post by Jichang Lulu.

  36. Victor Mair said,

    October 27, 2016 @ 8:02 am

    In one of the previous comments to this post, a PRC graduate student mentions the "river crab". Here's a recent China Digital Times (CDT) piece on the "river crab", whose mythical stature in the PRC rivals that of the "grass-mud horse" (for which see here, here, here, here, and here): "River Crabbed: Skirting 'Truth From Facts'" (10/20/16)

    For the "grass-mud horse" and "river crab" together, see here.

  37. Tom Gewecke said,

    October 27, 2016 @ 2:18 pm

    @Bathrobe I think that funny character is indeed an artifact of Apple's default Japanese font, Hiragino. Normally to make sure the OS prefers a Chinese font over a Japanese one for Han characters, you have to check the Preferred Languages list and make sure Chinese is higher than Japanese, or that it has Chinese but not Japanese on it. This note from a few years ago has some info

    http://m10lmac.blogspot.com/2011/09/odd-chinese-display-issue.html

  38. Jean-Michel said,

    October 27, 2016 @ 5:29 pm

    Regarding the graduate student's message in the post here:

    I remember that my mom always writes an odd character which combines 由 on the left and 攵 on the right. She says it is 数 as in 数学, as she was taught.

    My guess is the student's mother was taught this character during the brief and confusing era of the second-round simplifications, when 数 (itself already simplified from 數) was officially simplified to 由 + 攵.

  39. Bathrobe said,

    October 27, 2016 @ 7:15 pm

    Hi Tom,

    Thanks for the info. I'm relieved that this is a known problem.

    I've put 日本語 below both types of 中文 in my Language and Region settings, and I'm now back to that horrible Chinese version of 直 in my Japanese.

    On the plus side, 门 is now showing correctly!

    Chrome gives the option of choosing the font for the language, but it doesn't seem to have any effect.

  40. Bathrobe said,

    October 27, 2016 @ 7:34 pm

    @ Tom

    Of course, the language would have to be declared in the document, so I shouldn't expect it to have an effect in the comments section here.

    Even declaring the language (lang="xxx") makes no difference:

    Chinese (simplified): 直接,门
    Japanese: 直接、門
    Chinese (traditional): 直接,門

  41. Chas Belov said,

    October 28, 2016 @ 12:16 am

    I wonder whether they will encode Xu Bing's fanciful characters. Even with my rudimentary Chinese, I remember being fascinated by the walls and walls of them when they were exhibited at the Asian Art Museum in San Francisco many years ago. (And looking up Xu Bing, surprised to learn his given name translates as "Ice.")

  42. Victor Mair said,

    October 28, 2016 @ 9:24 am

    And now see "Paleographers, riches await you!" (10/28/16).

  43. Eidolon said,

    October 28, 2016 @ 4:02 pm

    "My own particular 'bias', as you call it, is against the Chinese regime's continuing attempts in the past few decades to replace ethnic minority languages with Han Chinese. Other particular 'biases' are against attempts at national aggrandisement based on simple aggressive nationalism, buttressed by the use of 'history' as a tool, and the co-opting of non-Han languages into this project of aggrandisement as fairly much an afterthought, despite the clear (although not publicly acknowledged) goal of reducing these languages to a state of irrelevance."

    There is no evidence that this project promotes either Han chauvinism, aggressive Chinese nationalism, or language replacement. The amount of criticism directed at an effort to digitize obscure, ancient, and minority scripts is, frankly, ridiculous. As I said, if this project was conducted by any organization other than the Chinese government, it'd have been praised as a triumph of historical preservation, so the bias is hard to ignore.

    "Now that I have stated my position (or 'bias'), please let me know yours. From your continued comments, you appear to be an unwavering supporter of the Chinese language, Chinese culture, Chinese civilisation, and the Chinese government against all comers. And your main 'peeve' seems to be with people who don't take such a charitable view of the country and culture you support. Perhaps I've judged you wrongly, but that is how you come across."

    My main peeve is with people who do not even attempt to be objective in their assessment of issues related to China.

    Consequently, if it appears that I am exhibiting "unwavering support" of Chinese language, culture, civilization, and/or government, that is probably a hint that criticism of the above is getting out of hand, in which case my goal, as I see it, is to bring balance back into the discussion.

    In this case, it constitutes observing that your criticism of 中华字库工程 is based on assumptions of sinister intent, rather than facts. In actuality, as both Endymion Wilkinson and Jinghang Lulu stated above, 中华字库工程 is neither new, nor designed to promote Chinese cultural imperialism on the international stage.

    You should afford more room for nuance. Just because you disagree with certain policies of the Chinese government, does not mean that every Chinese government effort must, therefore, be reprehensible. The world is not black and white.

  44. Bathrobe said,

    October 28, 2016 @ 7:23 pm

    "if this project was conducted by any organization other than the Chinese government, it'd have been praised as a triumph of historical preservation"

    If any organisation other than the Chinese government conducted this project, they would have done it differently. In particular:

    1) They would not have set a 'production goal' of 500,000 characters for their scholars. As you seem to be well versed in China you should know the game: the targets are set in stone by the leadership and everyone has to scramble to fulfil them. It happened in the Great Leap Forward, it's happening with economic statistics, and (as Jichang Lulu pointed out), it happened with the Chinese history project, where the political leadership set the goal not of researching China's early history but of backing up "5,000 years" of history. Setting up a grandiose, unsubstantiated goal for a research project on history is both political and nationalistic, and given that it is the government setting the goal, it is hard to escape the criticism that the figure of 500,000 is nothing more than aggrandisement.
    2) They would have included scope for international cooperation. (From the Baidu article it does not appear that this is included in the project, or is of any importance to it.) My criticism was that the Chinese government regard their minority peoples as a closed shop. These peoples share ethnicity and history with people in neighbouring countries – this is a fact, and ignoring this fact is sufficient proof of "sinister intent". If the Chinese government were serious about historical preservation, they would have included provision for cooperation with scholars in other countries. That would have obviated any suspicion that they are running a "national project" shutting out existing or potential contributions by non-Chinese scholars. That is the reason for my rather cynical suggestion that they are just as likely to redigitise Chu Nom in order to reach their goal since it would be completely in line with their closed shop approach.

    You will notice that the first point, in particular, is the subject of this posting, not the existence of the digitalisation project per se.

    The article by Jing Tsu, which you agree was "terrible", brought all of these issues into focus with its typical Han-centred cultural focus which many people seem to take for granted (Chinese = Han) and its obvious "pet peeves" (a phrase you introduced into the discussion) like lamenting that people can't use particular characters in their names — which other commenters here have demonstrated could be fixed more easily than with a 500,000 character national project.

    "my goal, as I see it, is to bring balance back into the discussion"

    Then you could have chosen less polarising ways of doing it. Your comments drip with antagonism and certainly don't contribute much to achieving your stated goal.

  45. Eidolon said,

    October 31, 2016 @ 8:16 pm

    "They would not have set a 'production goal' of 500,000 characters for their scholars."

    You are assuming that the 500,000 characters count was an a priori *goal*, as opposed to an *estimate*, of the scope of the project. But there's no evidence that the number was given first, and the effort then tailored to fit it. Whereas the number 5,000 years, drawn from Chinese mythology, was/is semantically and symbolically poignant, the number 500,000 characters is completely arbitrary and has no special significance beyond being a convenient rounding for journalistic purposes. The burden of proof is on you to show that the count was given by the government; the description given by Endymion above argues that it came from a rough estimate of ancient + kaiti + minority script characters.

    "They would have included scope for international cooperation."

    There is no obligation for any country to involve other countries in its cultural projects, regardless of whether its ethnic groups are related to groups in other countries. Your logic here is equivalent to saying that American Sinologists must involve PRC scholars in any government funded project related to Chinese history and/or culture. As far as I know, there is no such requirement, and there has never been an implicit *need* for international cooperation on the basis of shared ethnicity for *any* group. I therefore cannot sympathize at any level with your logic on this issue. It's like saying that in order to publish on Black History, we have to involve scholars from Africa, because otherwise, we are treating African Americans as a closed loop. Nonsense.

    "Then you could have chosen less polarising ways of doing it. Your comments drip with antagonism and certainly don't contribute much to achieving your stated goal."

    I hate to use this line of argument, but perhaps you should examine the tone of your own comments before pointing the finger, as it was quite difficult to *not* read it as dripping with hostility and sarcasm towards not only the project, but the country and its mainstream culture.

  46. Bathrobe said,

    November 1, 2016 @ 4:51 am

    "The burden of proof is on you to show that the count was given by the government"

    True, 500,000 is not a target per se, but why state a figure if it isn't regarded as achievable? The Baidu article explicitly mentions 汉字古文字约10万、楷书汉字约30万、各少数民族文字约10万. I'm not sure why this should be regarded as "a convenient rounding for journalistic purposes". Chinese websites usually quote figures like this from the original source.

    "There is no obligation for any country to involve other countries in its cultural projects"

    I don't see the relevance of your hypothetical examples. If the U.S. were proposing to digitise a script that certain U.S. blacks shared in common with a society in (say) West Africa, I should think it highly incumbent on the U.S. government to propose some kind of cooperation.

    "dripping with hostility and sarcasm towards not only the project, but the country and its mainstream culture"

    So hostility to Chinese government policies justifies your own personally directed antagonism?

  47. Eidolon said,

    November 2, 2016 @ 7:54 pm

    "True, 500,000 is not a target per se, but why state a figure if it isn't regarded as achievable? The Baidu article explicitly mentions 汉字古文字约10万、楷书汉字约30万、各少数民族文字约10万. I'm not sure why this should be regarded as "a convenient rounding for journalistic purposes". Chinese websites usually quote figures like this from the original source."

    The character 约 stands for "approximate." This is actually precisely the sort of journalistic rounding that you'd expect to see from a press release. More likely the count was given by Chinese orthographic experts who were asked to provide it. The Baidu article shows that the responsibility of codifying these characters rests with the various research departments of Chinese universities, not with government bureaucracy. While a certain degree of propaganda is associated with any government project, to say that this is like the effort to prove 5,000 years of Chinese history because Mao had said it in a boast, is a gross exaggeration. While the Chinese government might have reason to exaggerate the complexity of the writing system, I find it unlikely that they'd just come up with a number such as 500,000.

    "I don't see the relevance of your hypothetical examples. If the U.S. were proposing to digitise a script that certain U.S. blacks shared in common with a society in (say) West Africa, I should think it highly incumbent on the U.S. government to propose some kind of cooperation."

    The process of developing an encoding for a script is rarely cooperative and never obligated to involve all countries using it. For example, ASCII was developed by an American committee, not a committee of all countries using the Roman alphabet. The Japanese Industrial Standard, which includes encoding for kanji, was developed by Japan, not a committee of all countries using kanji. I do not expect China to involve other countries in the development of its new encoding; but I do expect that they will have to deal with international committees in getting their version accepted as an international standard, which is the normal way national standards become international standards.

    "So hostility to Chinese government policies justifies your own personally directed antagonism?"

    First, your hostility was not merely towards Chinese government policies, as a fair reading of your first few comments should show. Second, my first response to you was not nearly as personally antagonistic as you object. I criticized what I saw as an obvious bias, and questioned the logic behind reading so many ulterior motives in a digitization project. But this is a problem not only with your comments but also with Jing Tsu's article. The goal wasn't to attack you; it was to attack the way this project has been painted as a Chinese propaganda exercise without solid evidence. If you or Jing Tsu could've supported your interpretations with proof, then I wouldn't have had a problem with it.

  48. Jim Breen said,

    November 2, 2016 @ 8:13 pm

    Eidolon wrote: " I do not expect China to involve other countries in the development of its new encoding; but I do expect that they will have to deal with international committees in getting their version accepted as an international standard, which is the normal way national standards become international standards."

    This is true, although often the development of national standards is open to input from elsewhere. I contributed to the comments phase of the revision of JIS X 0208 a decade or so back, and was acknowledged in the appendices.

    It's certainly true that national bodies need to "deal with" international committees in establishing international standards, in fact the norm is that bodies like ISO will only consider proposals coming from national standards bodies (ANSI, DINN, AFNOR, etc.). Character encoding is now a little different in that ISO and the Unicode Consortium work together and there are other pathways into the process.

  49. Bathrobe said,

    November 3, 2016 @ 6:27 pm

    @ Eidolon

    You seem intent on constructing some kind of fairyland where 1) this is not a government project, and 2) the figure of 500,000 was just fodder for the press.

    I do not know whether this project was initiated from the very top, but without strong backing at a relatively high level it would have been difficult to get so many institutes and universities to cooperate. One of the facts of life in China is that government departments (including universities) do not cooperate without some kind of strong direction. Getting that many bodies to work together bears all the hallmarks of a government project. According to Baidu, the 中华字库工程 is part of the 文化产业振兴规划 (variously translated as Plan on Reinvigoration of the Cultural Industry and Plan to Adjust and Reinvigorate Culture Industry), which had the imprimatur of an executive meeting of the State Council chaired by Wen Jiabao. It is indeed a national-level project.

    I agree that the 500,000 estimate is not of the same gravity as the 5,000 years of history boast and does not involve the same level of political commitment. But this does not mean that it was just casually included in a press release. That would be far too irresponsible for a government-backed project. This estimate has now been put out there and represents an explicit (if inexact) target. If the project can only come up with 200,000-300,000 characters, it will still be egg on the face of its backers.

    The question that this post is asking is a valid one: where does the figure of 500,000 characters, including 100,000 characters from ethnic minority scripts, come from?

    Quite honestly, though, I don't think that we'll ever reach the end of this fruitless debate because we are starting from diametrically opposed attitudes. I am not a proponent of Big States and Big Languages, and I am not disposed to heartily approve of the Chinese push to make China into a powerful monolingual state, with healthy doses of nationalism and ingrained cultural attitudes to help the process along. You, on the other hand, are essentially defending and excusing whatever China does. The twain are unlikely to meet.

  50. Bathrobe said,

    November 3, 2016 @ 6:43 pm

    For "Big States and Big Languages" read "Mega States and Mega Languages". Big States and Big Languages have been around for a long time.

  51. Eidolon said,

    November 4, 2016 @ 6:49 pm

    @Bathrobe

    If you have actually read my past comments, you will realize that a characterization such as "essentially defending and excusing whatever China does" is a poorly supported description. But I'll go further and label it precisely the sort of personally antagonistic statement that you accused me of making. If you did not want the tension to escalate, perhaps you should not have tried to paint me as a Chinese apologist. Since you have continued to do so, I have no qualms about what I said earlier.

    As far as your strawman goes, you should quote where I tried to describe the project as "not a government project" or where I said that "500,000 was just fodder for the press." I have clearly stated both that 1) it is a government project and 2) the 500,000 likely comes from an approximate given by Chinese linguists. Bureaucrats do not come up with numbers like 500,000. The idea of "5,000 years of Chinese history" came from 19th and 20th century Chinese historians who conceived of Chinese history as beginning with the legendary Yellow Emperor, who reigned, according to them, over 4,800 years ago. As far as I know, no Chinese official has ever made the claim that the Chinese script has an additional 500,000 characters, nor is it based on any Chinese legends. Therefore, the number must have been based on some estimate, as opposed to a political mandate.

    As far as "Big States" and "Big Languages" go, whether you approve of them is immaterial, as they are the reality of the contemporary world. English is spoken by more than a billion people world wide; Modern Standard Mandarin by another 800 million. Gigantic states or state-like entities such as China, India, Russia, the US, and the European Union control the welfare of most of the world's population. How history reached this stage could be debated; whether it is a positive development could be debated; but stating that because you do not approve of it, you are allowed to make up facts and/or motives to criticize it, is inane. This is especially when the project you are criticizing may, in fact, positively benefit the diversity of scripts and languages used in China. I find that highly ironic. And you're right – there can't be an end to this debate, because you essentially cannot admit that anything positive can come out of this project or anything the Chinese government is involved in. Hence my accusation of bias. There's nothing left to say.

  52. Bathrobe said,

    November 5, 2016 @ 1:52 am

    "essentially defending and excusing whatever China does" is a poorly supported description'

    Perhaps, but I haven't seen much here to suggest the contrary.

    the number must have been based on some estimate, as opposed to a political mandate

    Your point is that it was not an a priori figure, which I have not disputed.

    500,000 likely comes from an approximate given by Chinese linguists. Bureaucrats do not come up with numbers like 500,000

    The best you can say is that the source of the estimate was probably linguists. Whether the bureaucrats played such a passive role is another question. Of course it's always possible that some Chinese linguists got together of their own accord and came up with 500,000 to sell their project to the bureaucrats, who then sold it to the State Council…

    stating that because you do not approve of it, you are allowed to make up facts and/or motives to criticise it

    Stating a position has obviously left me open to dishonest statements like this one.

    you essentially cannot admit that anything positive can come out of this project or anything the Chinese government is involved in

    If something good comes of it I will be the first to applaud. I am not opposed to adding characters to Unicode (I have submitted characters for addition to the simplified character set) and I am not opposed to encoding either Han characters or non-Han characters. I am, however, sceptical of the mentalities involved in coming up with the figure of 500,000.

    At any rate, I think the entire list is tired of this exchange. I am happy to call it quits.

RSS feed for comments on this post