Language Log

"Why I think the Chinese writing system is TERRIBLE"

November 21, 2022 @ 8:06 pm · Filed by Victor Mair under Writing systems

That's the title of this YouTube video (12:39; 4,572 views Nov 18, 2022) by ABChinese (34K subscribers):

The presenter also adds the following verbal comment:

No, it really is. I have come to the conclusion that Chinese characters aren't just difficult for English speakers to learn. They're difficult to learn on an ABSOLUTE SCALE. For lovers of Chinese though, this doesn't have to change anything. We can still admire the beauty in Chinese characters while recognizing that it's truly an inefficient and illogical system of writing… or I may be wrong.

NOTE: I also forgot to mention in the video, but another argument for why alphabetic writing is superior is that ALL computer languages are alphabetical. I'm not a coder, but from what I understand, Chinese coders tend to code in English because Chinese writing and computer languages are incompatible.

After these observations, ABChinese appends a useful set of "Research videos on Chinese writing and reforms", plus interesting information about himself and the things he loves.

ABChinese has many other thoughtful, well-informed YouTube videos about Chinese language and script, for which see here.

Selected readings

"The Awful Chinese Writing System" (Geoffrey Pullum, Lingua Franca, 1/20/16)
"Which is worse?" (1/21/16)
"Pinyin for ABCs" (5/14/20) — fabulous video (23:14) of three smart, hip, articulate ABCs ("American-born Chinese"), the Fung Bros; "10 REASONS WHY CHINESE AMERICANS CAN'T SPEAK CHINESE!" "10 REASONS WHY LEARNING CHINESE IS HARD FOR AMERICANS"
"Ted Chiang uninvents Chinese characters" (5/13/16)

[Thanks to David Moser]

November 21, 2022 @ 8:06 pm · Filed by Victor Mair under Writing systems

Permalink

38 Comments

Craig said,

November 21, 2022 @ 9:11 pm

The point about computer languages seems kind of chicken-and-egg to me. Computer programming is mostly done using English words because the stored-program computer and most of the early languages were invented in the US, which also led the development of many crucial early standards such as ASCII character encoding (the "A" in ASCII stands for "American"). Had these things been invented in China or Japan, they might have been quite different.

There have been programming languages based on natural languages other than English (there's even one based on Klingon, developed as a joke), but as far as I know, none have ever been successful internationally.
Chester Draws said,

November 21, 2022 @ 10:39 pm

The issue of computer languages is not that one can't write perfectly easy code that reads well in ideographs. Swapping out the English words in "FOR x = 1 To 40 DO" for Mandarin is perfectly easy to do and yields identical code.

The problem is inputting those words/symbols. Each non-alphabetic/syllabic programming language would require a different keyboard, with the correct ideographs for that language, and either dozens of keys or multiple options for each key.

Only an alphabetic/syllabic keyboard gives you a system that allows many dozens of words of code and allows you to move between languages flexibly with a keyboard of only 30 or so letters.

When typing a word I don't use very often, say VLOOKUP, I don't have to hunt around the 400 key options trying to find it. A person with a Mandarin language system doesn't have that flexibility.
Beirne said,

November 21, 2022 @ 11:07 pm

I thought he was just talking about data entry in general, not computer languages. You have the same problems when typing into a word processor or a web browser.
Victor Mair said,

November 22, 2022 @ 12:06 am

He was talking about computer code, not data entry.
cliff arroyo said,

November 22, 2022 @ 5:11 am

I had the idea he was basically talking about all the contexts that the written form of a language needs to be able to function in – in which characters don't function well.

One small example: How do you alphabetize characters? I'm sure there are ways of doing that but they seem to require relearning characters as alphabetical orders… just as shape-based input seems to require learning characters again as key combinations or numbers…

I understand that many want to preserve characters for cultural and/or aesthetic reasons but getting them to fit into modern technological/bureaucratic contests seems to require a lot of rupe goldberg workarounds that more sound based systems like pinyin (or zhuyin fuhao) don't.
John Swindle said,

November 22, 2022 @ 5:41 am

It would be fun if pinyin and bopomofo got in bed together like hiragana and katakana.
Philip Taylor said,

November 22, 2022 @ 5:48 am

[E]ngineers in England built the first stored-program computer, the Manchester Mark I, shortly before the Americans built EDVAC, both operational in 1949. [Source: Encyclopædia Britannica]

But more relevantly, computers can be programmed in Chinese. The computer programming language Wenyan-lang was created to demonstrate just this premise, and uses both traditional Chinese characters and classical Chinese grammar.
Victor Mair said,

November 22, 2022 @ 6:01 am

"Classical Chinese computing" (12/19/19)

https://languagelog.ldc.upenn.edu/nll/?p=45466

"Biblical Hebrew Computing" (8/8/22)

https://languagelog.ldc.upenn.edu/nll/?p=55588

For some reason, neither of these have caught on.
/df said,

November 22, 2022 @ 8:38 am

"Had these things been invented in China or Japan [rather than the English-speaking world], they might have been quite different."

Isn't that the point, though? Why did the heirs of the inventors of gunpowder, printing, paper, woven silk, etc, etc, miss out on computers and programming? Was there no East Asian Babbage or Jacquard? Or just no electricity?
Victor Mair said,

November 22, 2022 @ 10:30 am

Or just no alphabet?
Jerry Packard said,

November 22, 2022 @ 11:32 am

The Chinese writing system is not terrible, it is ingenious. It is easily alphabetized, which makes it easy for general comprehension and usage. It is generally true that the more phonetic a writing system is the easier it is to use and learn. Given that logic, the English writing system is terrible – – it is due to be reformed because of the lack of fit between the written and spoken language.
JOHN S ROHSENOW said,

November 22, 2022 @ 1:52 pm

btw: I looked at some of his other UTube videos listed to the right when I opened this one.
He has some very nice ones introducing how to hand-write Chinese characters which I think would be useful for beginning students.
cliff arroyo said,

November 22, 2022 @ 1:55 pm

"It is easily alphabetized"

How so?

Could you quickly alphabetize the following characters? (picked at random from front page of the Chinese wikipedia and explain on what basis they end up in alphabetical order?

理於代關平固膚族民少廣高
Jerry Packard said,

November 22, 2022 @ 3:12 pm

Not sure why you’d want or need to do that, but if you simply mark and then sort, many programs will sort them in PY alphabetical order.

It is easily alphabetized because all you need to do is select it and then many programs will give you the alphabetization in PY (incl tone).
klu9 said,

November 22, 2022 @ 4:15 pm

@Philip Taylor and Jerry Packard: Both wenyan-lang and "alphabetization" require an alphabet, i.e. a writing system other than Chinese characters.

Re Wenyan-lang: I don't know for sure how that language's programmers achieve characters in their code, but in this video about wenyan-lang, that sure looks like a common alphabetic keyboard in the background, not a Chinese character keyboard.
https://www.bilibili.com/video/BV1cJ411b7cp/?uid=425631634A34313162376370 Presumably the programmer types in an alphabetic (e.g. pinyin) input system which then offers a choice of characters.

Re "alphabetization", in "alphabetization in PY" presumably PY stands for pinyin. I.e. Latin alphabet. So again dependent on the existence of a writing system other than Chinese characters.

So both "counter" examples in fact demonstrate the kind of point the original video was making.
David Deden said,

November 22, 2022 @ 6:59 pm

"Was there no East Asian Babbage or Jacquard? Or just no electricity?"

I thought it might be lack of private large-scale capital funding to support not just technical invention but widespread promotion & investment? Was there something similar to East Indian trading Co. or many large private ventures? I think traditional & powerful confucian scholars would always reject implementation of labor-saving nontraditional machines except as Imperial novelties.
Jerry Packard said,

November 22, 2022 @ 7:38 pm

@klu9 – Yes it is true that in order to alphabetize you do need an alphabet.
Jonathan Smith said,

November 22, 2022 @ 8:51 pm

The notion of a deterministic relationship linking alphabets to computing seems to account poorly for the marginality of local-language input/coding/etc. tools worldwide, with many such cases involving languages written alphabetically (i.e., what Craig said…)

One might also wish to pontificate on say the impracticality of moveable-type printing for Chinese in contrast to English, though here there is the annoying obstacle of actual historical developments… it almost seems like human ingenuity + material conditions are the real factors at play…

Re: how computing might have developed in a Chinese-centered world, there are worse places to look than the moveable type associated in the first with Bi Sheng — it seems that it is possible to devise ordering and retrieval systems based on say ad hoc ordinal-ized subsets of the characters… go figure :D
Chester Draws said,

November 22, 2022 @ 10:23 pm

Re "alphabetization", in "alphabetization in PY" presumably PY stands for pinyin. I.e. Latin alphabet. So again dependent on the existence of a writing system other than Chinese characters.

But in fact much worse than that.

If you meet a word for the first time in English, you can look its meaning and pronunciation up very easily.

But if you don't know a Chinese word's approximate sound, how do you know how it was "alphabetised" in Pinyin?
Jerry Packard said,

November 23, 2022 @ 4:03 am

‘If you don't know a Chinese word's approximate sound, how do you know how it was "alphabetised" in Pinyin?‘

That is true – the chance of knowing or being able to infer a character’s sound is much less than an English word. But
the chance of knowing or being able to infer an English word’s sound is also pretty bad – ask any English learner.
Terry K. said,

November 23, 2022 @ 11:01 am

But you don't need to know how an English words sounds to look in up in a dictionary. In fact, you can look it up and the dictionary tells you how it sounds. Doing with Chinese when you can't cut and paste the characters (or use digital character recognition) is not at all like doing so with an English word.
Antonio said,

November 23, 2022 @ 11:37 am

Regarding programming languages: languages using symbols instead of words exist, for example, APL: https://en.wikipedia.org/wiki/APL_syntax_and_symbols — although they're not very popular

A better argument in my opinion is text encoding. CJK characters are much more complicated to encode than Latin letters (or Greek, Cyrillic, etc.). An obvious counterargument to this is that, since computers were mostly developed in the West in general and Anglophone countries in particular, little attention was paid to how the world's writing systems would be encoded; it was an afterthought.

But in the 1980s, Japan was a leader in computer development, and they still struggled. For example, games on Nintendo's Famicom console had Japanese text written using kana: https://legendsoflocalization.com/video-games-and-japans-three-main-writing-systems/

I think this clearly demonstrates that CJK characters are simply harder to encode.
Jerry Packard said,

November 23, 2022 @ 12:04 pm

And you can look up a Chinese word without knowing how it sounds. As with English, you can look it up and the dictionary tells you how it sounds.
Jonathan Smith said,

November 23, 2022 @ 12:34 pm

Obviously alphabetic writing allows for easy sorting of words… but lexicographical tools in China had long used fundamentally similar sound-based systems which would have been further refined in a hypothetical China-centered digital world — so e.g. tone>rime>onset is a simple attested mechanism which could be made more granular in various ways. The alphabet per se is incidentally not a very natural-feeling fit for languages like Chinese; symbols for onsets and rimes, with tweaks for "medial" segments, would seem to better match native intuitions.
Victor Mair said,

November 23, 2022 @ 12:59 pm

See The Fifth Generation Fallacy: Why Japan is Betting Its Future on Artificial Intelligence, by J. Marshall Unger (Oxford University Press, 1987.
Jerry Packard said,

November 23, 2022 @ 4:06 pm

@Jonathan Smith – Yes. Tim Light’s 1976 Cornell dissertation makes this point nicely.
Terry K. said,

November 23, 2022 @ 6:32 pm

@Jerry Packard. How? In a print dictionary, how do you look up a Chinese character you don't know? (And, yes, there's a difference between looking up characters and words.) My recollection from what I've read is there is no good system. Yeah, it's easy with a computer. (Although, I suspect it's not so simple and easy if you're using OCR on handwritten characters.) But when you can't use a computer to help, it's nothing like looking up an English word in the dictionary.
Jerry Packard said,

November 23, 2022 @ 8:54 pm

@Terry K.

Many ways. The most common is radical plus residual strokes. Also, total # of strokes but that takes longer.
maidhc said,

November 23, 2022 @ 10:55 pm

In

for (int i=0; i<=99; i++)g[i]+=1;

how much of the meaning comes from English?

But using an alphabet is rather convenient. Antonio mentioned APL, which had so many symbols that it needed a special keyboard, and still that wasn't enough. But I think many people found that more of a disadvantage than an advantage, which is why other later languages didn't follow the example.

Converting character strings to tokens is something that happens very early in processing a program, and it is a fairly trivial step as long as a few simple guidelines are followed in designing the language.
Andreas Johansson said,

November 24, 2022 @ 2:56 am

Pointing out that English orthography is flawed seems like besides the point. Very few people would contests that alphabetic writing can be done better than it's done in English.
cliff arroyo said,

November 24, 2022 @ 3:41 am

"English orthography is flawed"

Yeah, no one's claiming English spelling is some kind of model to be followed. Lots of languages make better use of their alphabets than does English.

Italian, Norwegian, Indonesian are examples of somewhat problematic systems that are still far more efficient than English.

"radical plus residual strokes"

Is it always easy to figure out what the radical is? I was under the impression that that's kind of arbitrary at times….
Taylor, Philip said,

November 24, 2022 @ 4:29 am

/df — "Why did the heirs of the inventors of gunpowder, printing, paper, woven silk, etc, etc, miss out on computers and programming? Was there no East Asian Babbage or Jacquard? Or just no electricity?"

Almost certainly because, at that time, the Chinese nation was somewhat pre-occupied with rather more pressing matters —

On October 1, 1949, Chinese Communist leader Mao Zedong declared the creation of the People’s Republic of China (PRC). The announcement ended the costly full-scale civil war between the Chinese Communist Party (CCP) and the Nationalist Party, or Kuomintang (KMT), which broke out immediately following World War II and had been preceded by on and off conflict between the two sides since the 1920’s. [Source: History.State.Gov]
Victor Mair said,

November 24, 2022 @ 9:54 am

The history of the development of computational linguistics and the processing of language in computers goes back long before the invention of computers per se. That is to say, computational linguistics and the processing of language in computers have a prehistory that led up the practical application of computers for the analysis and processing of language. These are not things that could be initiated or delayed solely by recent political events.
Terry K. said,

November 24, 2022 @ 10:53 am

@Jerry Packard

"Many ways". You don't see the problem with that?

I believe someone much more educated in characters than I could go into the difficulties of those individual methods, and why they aren't as good as using alphabetical order. But I think I can safely say "many ways" isn't as good as one standard method.
Jerry Packard said,

November 24, 2022 @ 11:51 am

No one is arguing that alphabetical, phonetic lookup is not better than a character-based lookup system. We need to accept characters on their own terms, that’s all.

I’ve never been a proponent of the view: ‘Gee, this is hard – why don’t they do it like we do?’
Victor Mair said,

November 24, 2022 @ 5:54 pm

"I’ve never been a proponent of the view: ‘Gee, this is hard – why don’t they do it like we do?’"

ABChinese is speaking from quite a different perspective.
Chas Belov said,

November 25, 2022 @ 12:20 am

With regard to programming, there's the separate matters of the language syntax, which are reserved words and symbols, and variable names, which can use any combination of permitted characters that do not constitute a reserved word. I read once of companies that outsourced programming overseas and the programs would come back with variable names in the languages of those countries. This would effectively mean that the programs would have to be maintained by speakers of those languages to understand the meaning of those variables.

Giving a field a good, reasonable to type and re-type yet unambiguous variable name in English – and ensuring the data contained therein is correctly described by that name – is hard enough.
Raul said,

November 25, 2022 @ 4:27 am

All the complaints about the difficulties of finding Chinese characters in a dictionary while it would be so easy to find the English ones seem to derive from the lack of user experience. I've never had much worries in finding cuneiform in dictionaries (and no, I'm not a native Sumerian speaker) despite of cuneiform marks having a notable number of different phonetic values and the original lack of a cuneiform 'alphabet'. On the other hand, I've seen many people struggling with alphabetic dictionaries, sometimes because they have never used paper dictionaries without a search function (even I, being a professional translator, don't use them every day anymore), and sometimes because the barrier between different alphabets seems too high, although Latin/Cyrillic/Greek/Hebrew/Arabic have all developed from the same source and are quite similar. The order of characters in the English alphabet is not God-given and biologically comprehensible for all humans; you have to learn it, and if you haven't, tough luck. (E.g., Z is a variation of S, so it makes much more sense to keep all your sibilants together as S Š Z Ž, like Estonians do.) For the ordering of characters in the Middle Eastern and European nest of alphabets, there is no underlying principle, unlike in most hieroglyphic or cuneiform systems which are actually rational.

Besides, English really makes a lousy example for an "easy" system, it's mostly promoted by monolingual native speakers. It can be quite hard to identify the pronounciation by writing, or vice versa. If you want to promote alphabets, take Finnish. Simple, easy, and you don't have to learn French first.

Btw, a much more interesting question about computer languages is that just like Western logic, also our computer languages' syntax is largely based on the Greek/Latin grammar tradition which is quite strongly tied to Indo-European languages and does not fit the other language families very well. If there was a grammatical natural-language basis without, say, the verb/nomen contrast (see, e.g., Selkup), it might yield quite different computational structures, and it is hard to say whether those might be more efficient for some task or approach than those we've got now.

RSS feed for comments on this post

"Why I think the Chinese writing system is TERRIBLE"

38 Comments

Craig said,

Chester Draws said,

Beirne said,

Victor Mair said,

cliff arroyo said,

John Swindle said,

Philip Taylor said,

Victor Mair said,

/df said,

Victor Mair said,

Jerry Packard said,

JOHN S ROHSENOW said,

cliff arroyo said,

Jerry Packard said,

klu9 said,

David Deden said,

Jerry Packard said,

Jonathan Smith said,

Chester Draws said,

Jerry Packard said,

Terry K. said,

Antonio said,

Jerry Packard said,

Jonathan Smith said,

Victor Mair said,

Jerry Packard said,

Terry K. said,

Jerry Packard said,

maidhc said,

Andreas Johansson said,

cliff arroyo said,

Taylor, Philip said,

Victor Mair said,

Terry K. said,

Jerry Packard said,

Victor Mair said,

Chas Belov said,

Raul said,

Follow us on Twitter

Archives [+/–]

Blogroll [+/–]

Meta