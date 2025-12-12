Differential retention of sinographs across East Asia
[This is a guest post by J. Marshall Unger]
Well, first of all, the difficulty of learning a language can only be measured relative to the language(s) the learner already knows. Japanese is easier for Koreans than for Americans; I would guess Chinese is easier for English speakers than, say, Arabic speakers. Second, language isn't writing. Learning to write Japanese or Chinese is hardly a snap even for native speakers.
As for Julesy's comments, I would just add that, as DeFrancis pointed out, the thing that really made romanization universal in Vietnam was the determination of Ho Chi Minh and his allies to educate the peasantry so they could be mobilized to drive out the colonialists. In Korea, I suspect that bad memories of the Japanese occupation gave hankul a boost it might not have enjoyed had Korea remained independent. As for why Japan still uses kanji, it started as a class thing. Lots of printed material before 1945 was produced with furigana on practically every character. Ironically, progressives who wanted to limit the number of kanji in general use and the readings kanji could take opposed such furigana use, believing that getting rid of them would force publishers to show restraint. After 1945, in theory at least, the old class structure was eliminated, and the compromise was a limited number of kanji with more or less sensible readings taught to all kids (except the blind) alike regardless of family background. That compromise will probably continue in one form or another until the aging of the population, the need for foreign workers, a severe downturn in the economy, or some other catastrophe rouses the government from its accustomed lethargy. Meantime, in computer environments, most Japanese type in romaji: they throw away the romaji input once (they think) they have the graphic output they need, but they're using romaji passively all the same.
As DeFrancis emphasized, abolishing characters isn't a realistic goal. Rather, China and Japan ought to aim for digraphia: one national standard romanization for putonghua and one for modern standard Japanese; teach that romanization without apologies in the schools alongside traditional writing; and pass laws to make sure that anyone who wants to use that romanization for any everyday purpose is not penalized for doing so.
Robert Ramsey said,
December 12, 2025 @ 12:21 pm
This is a great little essay, Victor–but of course I would have expected nothing less from Jim Unger.
And yes, the Korean aversion toward sinographs does owe much to their association with Japan–in North Korea, at least, Kim Il Sung was pretty explicit about that! He hated everything associated with Japan, after all.
Alex Fui said,
December 12, 2025 @ 12:49 pm
The digraphia proposal is compelling, especially framed as “no penalties for everyday use” rather than abolition. The Vietnam and Korea examples also underline how literacy policy and identity politics can outweigh pure “efficiency” arguments. On a much smaller scale, I’ve been playing with orthography-focused tools Anagram Solver to help learners notice patterns across romanization and spelling. Thanks for the post.
Logoplanetarian said,
December 12, 2025 @ 4:13 pm
Anagram solver – these were available in the early days of the internet. And the problem is , that with any reasonably long word set, the possibilities become so wild that it's just boring. Selection is the problem, and it needs a human – AI, maybe, let's see- to do that. We played the anagrams game 45 years ago. Isambard Kingdom Brunel? Man making rubber dildos. Wolfgang Amadeus Mozart? Fart gas glow on a mud maze. Best of luck with non- native English speakers.
Chris Button said,
December 12, 2025 @ 6:13 pm
Can't educated people read a lot more kanji than just the Jōyō kanji though? I've always suspected that the Jinmeiyō kanji are as much for use in names as for simply extending the list of kanji that are helpful to know.
For example, despite it being a Jinmeiyō kanji, presumably many Japanese people would know that 苺 has a kun-yomi of ichigo to mean "strawberry"?
But Japan does have kana.
A standard romanization, namely Hepburn, is of course needed for international transliteration purposes. But isn't that need the same for countless scripts in use around the world?
Peter Cyrus said,
December 13, 2025 @ 6:24 am
Julesy claims that Chinese can be typed faster than English. But in addition to a quibble about comparing characters/minute with words/minute – there are apparently 1.85 characters/syllables in the average Chinese word – the Chinese was accelerated by software predicting the most likely matches for the pinyin or zhuyin typed, thus speeding up or even eliminating the second step of the input process wherein the desired character is selected. Predictive text is widely available in English, too, but I don't believe it was used in the measure of typing speed.
Another accelerator I've run across is the ability to continue typing the second (or later) syllable of a Chinese word before resolving the first. There are many fewer homophones of complete words than of syllables. In fact, there are IMEs that accept only the initials of multi-syllable words, e.g. they would parse "pyou" or even "py" as 朋友 péngyǒu. I've never run across such a shortcut in English, except maybe in handwritten shorthand.
For Musa, we're working on an IME that retains the phonetic (or looks it up if a shortcut was used) and a font that displays it as furigana (ruby). If such tools became widespread, one can imagine a slippery slope towards omitting the character unless needed, a one-and-a-half script, a "poltoragraphia". :)
Victor Mair said,
December 13, 2025 @ 10:11 am
@Peter Cyrus
Thank you very much for all three valuable paragraphs of your comment. Because I have to critique somebody's dissertation today, I must needs be brief for now.
I have been closely tracking Chinese IMEs for four decades. The fact that there are hundreds of them right away tells you something about the existential dilemma they all face, something that Lin Yutang should have known from the moment he set about designing his ill-fated MingKwai typewriter.
All the inventors of Chinese IMEs claim that they can type as fast or faster than English (alphabetical) typing, but that is only for short, thoroughly prepared texts. Whenever I observe them closely for extemporaneous texts, they are much slower than alphabetical typing. I could tell you some very funny stories about how they tried to hoodwink me into believing that their systems were fast and easy. Not.
Here's a summary of my findings about Chinese IMEs as of eight years ago:
"Easy versus exact" (10/14/17) — with extensive bibliography
That should disabuse you about a lot of misconceptions concerning the supposed suitability of the Chinese writing system for computers.
Try shape-based inputting vs. pinyin for lài 癞 / 癩 ("scabies") and pēntì 喷嚏 / 噴嚏 ("sneeze").
"The impact of phonetic inputting on Chinese languages" (12/9/19)
Includes an amazing demonstration of how convenient it is to communicate with toneless, properly parsed pinyin; with a long bibliography.
Please tell us more about Musa. There are dozens of possible meanings. Maybe write a brief post about it. I love the idea of "poltoragraphia".
Peter Cyrus said,
December 14, 2025 @ 7:06 am
Thank you for your curiosity :) I'll let someone else write that post (about Musa). Meanwhile, the site is http://www.musa.bet, or http://www.upa.bet for transcription.
Mike Ryan said,
December 14, 2025 @ 9:27 pm
J Mashall Unger wrote: Meantime, in computer environments, most Japanese type in romaji:
On a laptop, yes, but on a smartphone I assure you the vast majority of Japanese use hiragana to type. On an iPad I am sure it is the same.
wgj said,
December 15, 2025 @ 12:41 pm
One aspect of Chinese I feel we haven't talked (enough) about in these discussions is how much the written language and the spoken language differ. My sense is that they differ significantly more than European languages, at least today – historically they may have all been very different, but in modern times the written and spoken languages have converged more for European languages than they have for Chinese.
This is relevant because any alphabetization of an ideograpgic writing system means loss of information. The amount of information lost depends – chiefly – on the amount of homophones the language has. This means a high amount of information in Chinese and an even higher amount in Japanese. The exact same loss of information happens in the spoken language, but every natural language has evolved to deal with this kind of problems, using a variety of means. If the spoken language and tht written language are very distinct, however, the written language will not feature all of the adaptations the spoken language had evolved, because the written language doesn't have the same problem of information loss. As a result, a straight-forward alphabetization of the written language will face the information loss without the means to compensate it. In time, the written language will have to evolve, adapting some of the same (or similar) compensation techniques from the spoken language, and invent some additional techniques.
I assume someone must have studied this process for Korean (both North and South, which presumably have taken similar but somewhat different paths of written language evolution) and Vietnamese (was there a North-South divide as well?).
wgj said,
December 15, 2025 @ 1:14 pm
Have you encountered the Chinese IME named RIME? It's an open source, multi-method IME system from Taiwan. Multi-method means it allows you to use (and switch between) different input systems, including Pinyin, Bopomofo, Canjie, Wubi etc. The truly revolutionary, new input system it introduces, however, is the concept of "cording" – pressing several keys at the same time with both hands to input one character.
This is actually how modern Chinese stenography machines work. If you have ever seen a stenographer typing on her keyboard (it's almost always a woman), you'll be astonished how slow she is typing – it doesn't seem like she's pressing nearly enough keys to recorde all those words. The trick is that every time she's pressing, she's pressing several of them at once with each hand, and that combo produces a unique sequence (like a cord on a musical instrument, thus the term cording).
RIME is the first IME that allows you to do cording on a computer – and to define your own cord-to-character mapping. It requires a good keyboard – cheap keyboards, especially wireless ones, often cannot deal with several keys being pressed down at once (or has an upper limit of three or four simultaneous keys, which is insufficient). Gaming keyboards usually can, though, because pressing several keys simultaneously is what gamers need to do all the time.
Here's RIME's documentation for Combo-Pinyin, a cording pinyin system that the author admits to be inspired by stenography systems:
https://github.com/rime/home/wiki/ComboPinyin
Philip Taylor said,
December 15, 2025 @ 4:39 pm
Two comments, WGJ — (1) I believe that "chording" is also the technique used by courtroom stenographers who operate a device termed a stenotype [*]; and (2) is "cording" the conventional spelling for this technique in your part of the world ? I ask the latter question because, to me at least, "chording" reflects the musical technique whereby one plucks multiple strings, or strikes multiple keys (etc.) simultaneously, whereas "cording" would be something one does with cord (thick string).
[*] A courtroom stenotype is a specialized keyboard used by court reporters to create a verbatim record of spoken words, using phonetic shorthand and simultaneous key presses (chording) to achieve speeds over 225 words per minute, far exceeding standard typing, with modern machines linking to software for real-time translation into text. It has fewer keys than a normal keyboard, organized into starting consonants (left hand), vowels (thumbs), and ending consonants (right hand), plus a number bar and asterisk for deletion, allowing a single stroke to represent entire syllables or words.
Jarek Weckwerth said,
December 16, 2025 @ 4:34 am
@ Philip Taylor: Interestingly, of course, in languages with an alphabetic script, this preserves the advantages of an alphabetic script. The stenographer can write down stuff they have heard for the first time in their life with a much greater chance of success that in, erm, other systems.
Seong of Baekje said,
December 18, 2025 @ 10:41 am
If Japanese would just adopt the sensible spacing conventions of Korean writing (use spaces between words but attach particles to the preceding word), I think it would be well on its way to (largely) discarding kanji.