Yet another sinographic stumbling block for Chinese modernization
« previous post | next post »
After coming face to face with the unavoidable debacles inherent in mechanical Chinese typewriters (not to mention many other pitfalls of the writing system), Language Log readers will not be surprised to learn that sinographs were not well suited for telegraphy:
The only slight blemish in Julesy's latest exposure of the truths about the Chinese script is that she titled it "Why Morse Code Didn’t Work for Chinese — and the Genius Fix". When it came to scientific, efficient telegraphy, not even a genius could fix the hyper complicated, archaic characters.
As a Boy Scout, I learned Morse Code, and I think I even got a merit badge in telegraphy. I'll never forget the meaning of | … – – – … |.
When I first started going to the Chinese mainland in the 80s, I was intrigued by how the Chinese telegraph worked. Because I had to travel around by train quite a bit and even had to send a few telegraphs myself, I had ample opportunity to observe how the operators worked.
I was aware that each "common" Chinese character had an arbitrarily assigned unique four-digit code, ranging from 0000 to 9999. It didn't take long for me to realize that it was humanly impossible to memorize such a system, so I observed carefully how the telegraph operators dealt with such an incredibly refractory method. Because they routinely had to send so many telegrams, they soon memorized the four-digit codes for the most frequent characters. Beyond the top fifty or so characters, however, they had to look up the codes in a well-thumbed, dog-eared manual, and you could hear them shouting to each other questions like "What's the code number for yíng 贏 ('win')?"
That reminds me of the time when I went to observe the working habits of the Chinese typists at the United Nations. They too had to look up less frequent characters in their handbooks. Their main fonts only had about two thousand lead characters, but they still had to consult with colleagues for the location of less frequent characters. For least frequent characters, they would have to get into supplementary type cases, and that often took a lot of hunting and pecking. For characters that were not in their main font and supplementary type cases, they would have to write in the missing characters by hand, and that happened fairly frequently.
The situation was a little bit like memorizing PLU (price look-up) codes / PLU numbers / PLUs / produce codes nowadays. The most capable cashiers store the PLU codes in their brains, but changing products mean they have to keep getting used to new codes. The big difference, however, is that it's easy for them to look up the PLUs electronically via alphabetical and visual indices. It's so simple that even customers without any training can do it easily.
Premier Zhou Enlai (1898-1976) used to travel a lot in the performance of his diplomatic duties. I have heard repeatedly from numerous sources that Chinese telegraphy was the most costly part of his travel expenses.
Incidentally, if you're curious about the meaning of the two characters on the cover page of Julesy's current video, they are juémì 绝密 ("top secret") — that's for people who cannot yet see the core defects of the Sinographic writing system.
Selected readings
- "Morse Code for China", by Al Williams, Hackaday (11/13/25)
- "The many myths about the Chinese typewriter" (9/7/25)
- "Flag codes: another type of Hong Kong resistance writing" (10/5/20)
[Thanks to Leslie Katz]
JMGN said,
November 13, 2025 @ 7:11 pm
https://en.wiktionary.org/wiki/…—…#Translingual
Victor Mair said,
November 13, 2025 @ 7:39 pm
@JMGN
Please explain what you want us to do with this.
JMGN said,
November 13, 2025 @ 8:27 pm
Us?
Replaceable with https://en.wikipedia.org/wiki/SOS
Victor Mair said,
November 13, 2025 @ 8:42 pm
Language Log readers.
Jonathan Smith said,
November 13, 2025 @ 9:59 pm
link was supposed to contain period period period hyphen hyphen hyphen period period period not ellipsis m-dash ellipsis… speaking of which why no Unicode Morse code, maybe CJK has a couple spots to spare?
Marc said,
November 14, 2025 @ 12:49 am
What about her comment at the end about the unsuitability of Pinyin for Morse? Do you agree the homonyms make it impractcal?
Philip Taylor said,
November 14, 2025 @ 5:22 am
I've not watched the video but the immediately preceding comment prompts me to ask : "Is Morse Code used by the Vietnamese, and if so, is there a de facto (or even formal) extension to Morse that allows the tone of a word to be signalled ?".
wgj said,
November 14, 2025 @ 9:34 am
Instead of labelling sinographs as not well suited for telegraphy, I think it's much fairer to say it's telegraphy (in it's original form) that was not well suited for sinographs.
After all, sinographs had already existed for millennia when telegraphy was invented. And we usually expect newly invented thing to fit what already exist, and not the other way around.
Philip Taylor said,
November 14, 2025 @ 10:20 am
I think that that is a very fair observation, but as one who has both used Morse code in real life (an emergency situation on Mull) and who has studied Mandarin Chinese for three years, I find it difficult to think how one might be able to transmit Sinographs telegraphically using the technology that existed when Morse code was invented. If you can suggest how this might have been achieved, I would be very interested.
Victor Mair said,
November 14, 2025 @ 10:51 am
From an anonymous reader, who knows intimately whereof he speaks:
This Language Log post really brought back memories.
In the 1960s (and probably before and after), the Chinese Navy still used a lot of Morse, even for tactical comms. Morse chatter between operators was done through a few dozen standardized bi- and trigraphs, e.g., receiver's callsign "de" sender's callsign, "cq" (reaching out to anyone on the net), "msg" (message), "bt" (message follows), "pls chk" (please check), and even "zgzs" (最高指示) followed by numbers that referred to page and paragraph in Mao's Little Red Book, sent to demonstrate political correctness–or to scold the other party if they had fouled something up. Most of the limited vocabulary they used, however, was English-based, e.g., "xmtr trub hr" (transmitter trouble here). I once heard an annoyed operator tap out to the guy on the other end, in English, "use other foot." The only case I saw of pinyin words transmitted in Morse was one very angry operator spelling out the National Curse and then "Ni shi bi."
That aside, they maintained pretty good discipline. Other than chatter, all messages–short, long, real, drill–were sent in "cut numbers". Normally, Morse numbers 1 to 0 are each five units: one dot and four dashes for 1, two dots and three dashes for 2 and so on. But since the entire message is in numbers, there's no need to send all five dots/dashes for each number, so they reduced them to the minimum needed for a number to be distinguishable within the set. This sped things up considerably.
All operational messages were sent in three or four digit blocks, depending on the type of traffic. 4-digit messages presumably were enciphered STC (Standard Telegraphic Code) tetranomes, commercially available codebooks with 4-digit numbers assigned to each Chinese character, e.g., 2053 5307 or 5312 or 5324 0140 (我船 or 艇 or 舰在) followed by lat/long coordinates. There are rumors of people who could, if high on beer, hold limited verbal conversations in STC.
The system was less cumbersome than one would think. They were not, after all, writing literary pieces. End-to-end encrypted fax and RT (radio telephone) eventually took over most of the communications. But there was a time when Morse reigned and the Chinese coped.
wgj said,
November 14, 2025 @ 10:59 am
https://en.wikipedia.org/wiki/Four-corner_method
From the article:
The four-corner method was invented in the 1920s by Wang Yunwu, the editor in chief at Commercial Press Ltd., China. Its original purpose was to aid telegraphers in looking up Chinese telegraph code numbers in use at that time from long lists of characters.
Later, we had wubi, or five-stroke – an equally intuitive but much more efficient encoding of Chinese characters. Even though wubi was invented in and for the personal computer era, it did not require advanced technology and could have plausibly be invented much earlier – as soon as the Western typewriter was known.
Victor Mair said,
November 14, 2025 @ 5:21 pm
During the last two decades, there have been hundreds of Language Log posts that touch upon the difficulties of recalling, reading, writing, looking up, and inputting sinographs
Start by perusing these posts carefully and thoughtfully:
"Character Amnesia" (7/22/10)
"Dumpling ingredients and character amnesia" (10/18/14)
"Writing characters and writing letters" (11/7/18)
"Chinese character inputting" (10/17/15)
We have examined the two-corner system of Otto Julius Rosenberg (1888-1919), the four-corner system of Wang Yunwu (1888-1979), and scores of other methods for looking up and inputting sinographs by shape, stroke order, and so forth. Chinese and foreign scholars have tried every conceivable method for making it easy, fast, and efficient to handle the tens of thousands of characters in the archaic morphosyllabic / logographic Chinese writing system. It hasn't happened yet. In my opinion, it never will.
JMGN said,
November 14, 2025 @ 6:06 pm
"In my opinion, it never will."
I'm hopeful in factoring in AI technology in the new decennium…
Victor Mair said,
November 14, 2025 @ 6:49 pm
AI can change a leopard's genetically determined spots?
Philip Taylor said,
November 14, 2025 @ 6:16 pm
To answer my own question ( "Is Morse Code used by the Vietnamese, and if so, is there a de facto (or even formal) extension to Morse that allows the tone of a word to be signalled ?"), I now learn (after 25 years !) that my (Vietnamese) wife was once a regular user of Morse code and assesses herself as having been quite skilled at it. And she tells me that, as far as she can recall, Vietnamese was signalled without tone markers and without diacritics, the assumption being that any native speaker of Vietnamese would be able to infer which words were intended (where there could be ambiguity) simply from the context.
Victor Mair said,
November 14, 2025 @ 6:48 pm
@Philip Taylor
I've been hearing the same thing from Vietnamese speakers for the last quarter century, and apparently they do it without taking advantage of word spacing / parsing, which I have strongly advocated for Mandarin written in pinyin and which is becoming increasingly popular in fenci lianxie (word segmentation and linkage) without tones.
Joyce Melton said,
November 14, 2025 @ 9:59 pm
When I worked for the Army Security Agency in the 70s, we used Telex to communicate in Vetnamese. Circumflex was indicated by doubling the letter on a, e and o. Double d was d-bar. W was used for saucer and hook add-ons to vowels: a, o and u.
Tones were indicated by f-falling, s-high rising, r-falling rising, x-broken, and j-low falling. The tones came after the vowels or after the whole word, so s,r, and x did not cause confusion because they never come at the end of a word or after vowels.
wgj said,
November 14, 2025 @ 10:35 pm
You wrote yourself, "no one except professional, full-tme typists", and the telegraphers are just that – professional, full-tme typists.
And don't forget, the popularity of pinyin input presumes the widespread adoption of Standard Mandarin pronunciation – which did not exists at all back then when telegraphy was invented. There are still many older people in China today who don't use pinyin to write on their smartphones (mostly choosing handwriting instead) because they make too many mistakes in pinyin.
Furthermore, the whole discussion is about how to compress Chinese writing down to a semi-binary encoding efficiently in terms of information density and ultimately, cost. Anytime you prioritize cost efficiency, you cannot prioritize convenience also. To serve this objective, wubi and co. don't need to be as convenient as pinyin, they just need to be good enough to be humanly learnable – which they definitely are.
Finally, you remarked yourself how much pinyin input systems have improved over the years in their accuracy of predicting what the user want to write. That improvement is thanks to computer (statistical analysis), the internet (collecting online corpus to create various subject dictionaries), big data (collecting and processing writing behavior pattern of users), and lately, predictive AI (natural language processing). None of these existed when telegraphy came about.
Philip Taylor said,
November 15, 2025 @ 5:50 am
Thank you for your observations, Joyce. I served my (GPO) apprenticeship working on and with Telex (this is also where I learned Morse code), but had no use for (or knowledge of) Vietnamese at the time. Telex, to the best of my recollection, could achieve speeds of up to 66 wpm, whilst Morse code (using a conventional key, not a side-swiper) was typically only half as fast. So adding tone-markers and diacritics to Telex, using the conventions you reported, would have less overhead (in absolute terms) than using the same conventions with Morse, although the relative overheads would have been almost identical.
Regarding "The tones came after the vowels or after the whole word", my wife tells me that in handwriting one typically adopts the same procedure — one writes letter and diacritic before moving on to the next letter, but adds the tone marker only once the word is complete.
Victor Mair said,
November 15, 2025 @ 7:10 am
@wgj
Precisely!
Science will continue to march forward, and sinography will continue to catch up.
wgj said,
November 15, 2025 @ 7:46 am
"Catching up" is only required when crucial technology is invented by others – which has been the case for the last quarter millennia. Going forward, if and when civilization-advancing tech comes out of China *again*, it will likely be suitable for sinographs out of the box. I mean, can you imagine instead of paper, China invents some material that is good for writing cuneiforms but bad for writing sinographs? This is precisely what I've been trying to say with my first comment.
Jarek Weckwerth said,
November 15, 2025 @ 8:08 am
@wgj: Perhaps the question is not "was telegraphy well suited to sinography" but "was sinography well suited to representing language". The answer to that latter question, as advocated by Prof. Mair in all those linked posts is, "not very well".
Apart from all the other countless considerations, the fact that alphabetic writing enabled cost-effective and easily learnable indexing, telegraphy, text entry on computers etc. etc. is sufficient evidence of the fact that it suits human language better in a very real evolutionary/competitive sense. This greatly helped the development of modern technology which allowed the alphabetic cultures to conquer the world.
wgj said,
November 15, 2025 @ 1:18 pm
I am not convinced of that argument – although I'm also far from convinced that it's wrong.
Your statement that "alphabetic writing enabled" X, Y and Z is anything but a "fact" – it's a claim that requires evidence that hasn't been provided. My counterclaim is that those thing aren't enabled by alphabetic writing, but instead by Western inventors, who happened to use alphabetic writing, *for* their alphabetic writing. And had those inventors been Chinese instead, those inventions would've been suitaed to sinographs, and some other person in the parallel universe would be arguing the opposite point – namely, that sinographs "enabled" those inventions.
Causality is always incredibly hard to prove. You should casually assume any.
Jim Unger said,
November 15, 2025 @ 2:21 pm
I agree with Jarek Weckwerth: the fact that alphabets enable cost-effective and easy-to-learn techniques such as touch typing, indexing, sorting and retrieval of data, and so on is sufficient for explaining why they have been so widely adopted, in one way or another, for so many different languages. This raises the question of why China has largely, though not entirely, been resistant to alphabetic writing. One obvious hypothesis is that, as long as skill in literacy is a scarce human resource, those (few) who possess it see the difficulty of acquiring it as a kind of insurance against their losing the privileges, wealth, and social mobility it affords them. They are therefore naturally loathe to see anything replace characters as the default form of writing. This seems to have been a reason for Mao's backtracking on pinyin and retreating to character simplification (he needed literate bureaucrats, though later he turned on them in the Cultural Revolution). I wonder whether there are any Chinese authors who have ventured this hypothesis explicitly.
Jarek Weckwerth said,
November 16, 2025 @ 12:46 pm
@wgj: Western inventors, who happened to use alphabetic writing, *for* their alphabetic writing. And had those inventors been Chinese instead, those inventions would've been suitaed to sinographs — Well, the question could then be "Why were those things not invented by Chinese inventors first". Of course we can't test it in a parallel universe experiment. But we do know that evolution has a strong aspect of competition, and if you achieve the same results with less resources, or more effective results with the same resources, you win. So there is a point to be made along these lines. For example, if telegraphy is more efficient than other methods of communication, and alphabetic writing allows more efficient telegraphy, then alphabetic writing wins. Etc. etc.
Philip Anderson said,
November 18, 2025 @ 8:18 am
I don’t doubt that if the electric telegraph had been invented in China, they would have adopted a code that worked with Chinese words/characters. I am equally sure that the invention would soon have been adopted in the West, but with a much simpler (and quicker) code to work with an alphabet.
According to Wikipedia, something similar to that actually happened:
“In his earliest design for a code, Morse had planned to transmit only numerals, and to use a codebook to look up each word according to the number which had been sent. However, the code was soon expanded by Alfred Vail in 1840 to include letters and special characters, so it could be used more generally.”
https://en.wikipedia.org/wiki/Morse_code
Philip Taylor said,
November 18, 2025 @ 1:14 pm
"I don’t doubt that if the electric telegraph had been invented in China, they would have adopted a code that worked with Chinese words/characters" — well, maybe not "adopted" but rather "created" or "invented". However, let us (attempt to) put ourselves in their shoes, but with the added benefit of almost 200 additional years experience in code creation. How might such a code work ? I ask this not because I think I know the answer, but for exactly the opposite reason — I cannot imagine how such a code might work. Perhaps others might be able to demonstrate feasible solutions to the problem — if they can, I will be extremely interested to read their replies.
Philip Taylor said,
November 19, 2025 @ 6:32 am
Well, I lay in bed and pondered this problem, and the following is my proposed solution — let us imagine that the Chinese telegraphist is requested to send 敵軍從左翼推進增派援軍。The telegraphist mentally transcribes this into BoPoMoFo, yielding ㄉㄧˊ ㄐㄩㄣ ㄘㄨㄥˊ ㄗㄨㄛˇ ㄧˋ ㄊㄨㄟ ㄐㄧㄣˋ ㄗㄥ ㄆㄞˋ ㄩㄢˊ ㄐㄩㄣ and sends this using the yet-to-be-proposed-and-agreed ITU Morse-like encoding for the 42 possible BoPoMoFo symbols. More than feasible, it seems to me. This would also seem to be even more efficient than standard Morse, since the BoPoMoFo consists of only 34 characters whilst the English equivalent consists of 48 ("Enemy advancing on the left flank — send reinforcements").
Jarek Weckwerth said,
November 21, 2025 @ 5:43 am
@ Philip Taylor: That just shows that you would need to replace sinographs with another system, doesn't it.
Philip Taylor said,
November 21, 2025 @ 6:02 am
I'm not sure that it does, Jarek — when I send (e.g.,) –· ···– – –· –·-, I am not sending (qua sending) "G3TGQ", I am sending its representation in Morse code. In just the same way, if I send 敵軍從左翼推進增派援軍 as a series of dots, dashes and spaces, I am not sending 敵軍從左翼推進增派援軍, I am sending its encoding in BPFMM ("BoPoMoFo Morse"). Yes, the latter involves a 2-level encoding rather than a single level, but I am not replacing the sinographs, I am encoding them on the basis of their pronunciation. Would you not agree ?
Rodger C said,
November 21, 2025 @ 10:45 am
Um, how is a phonemic representation of a word the "encoding" of a grapheme that doesn't reflect the phonemes significantly?
Philip Taylor said,
November 22, 2025 @ 5:09 am
But the grapheme does "reflect the phonemes significantly", otherwise sinographs simply could not work (and they demonstrably do work — even my wife, whose first language is Vietnamese, usually has little difficulty in telling me how a given sinograph should be pronounced). It is only we barbarians in the West who regard sinographs as opaque, whence their frequent mis-classification bu the non-cognoscentias "ideographs". When we used to gather at one of my wife's family member's home for karaoke sessions, I was absolutely staggered at the ease with which the older members of her family could sing with only the singraphs as a cue to the words that they should be singing, but for them it was the most easy and natural thing in the world.
Philip Taylor said,
November 22, 2025 @ 5:55 am
After all (sorry, this thought came to be only after I had submitted the previous comment), there is absolutely nothing in the set of ideographs "d o g" that suggests that when concatenated they should be pronounced /dɒɡ/ — this is simply a convention with which we grew up and which we therefore regard as "normal". The Chinese feel exactly the same about Hanzi.
Philip Taylor said,
November 22, 2025 @ 5:56 am
"… came to me …".
Jarek Weckwerth said,
November 22, 2025 @ 10:30 am
You mean your wife is able to guess the pronunciation of previously unfamiliar sinographs? I think that's quite a feat.
As I say in a parallel thread, the shapes of the letters D O G do not allow us to "predict" the pronunciation (well, O is in fact a bit iconic, but let's leave that aside). But the letters were intended to be, ideally, in a one-to-one relationship with the pronunciation such that a person knowing the system would be able to pronounce the word without prior familiarity. In other words, the system "encodes" pronunciation. And works the other way, too (i.e. a person knowing the pronunciation should in principle be capable of writing it down). With a small number of symbols. As far as I'm aware, sinographs don't allow either.
And your bopomofo example does exactly that. It encodes the pronunciation. A person who does not already know the sinograph for a particular "word" will not be able to write it down.
Philip Taylor said,
November 22, 2025 @ 1:15 pm
"You mean your wife is able to guess the pronunciation of previously unfamiliar sinographs ?" — No. Only those with which she is already familiar, although she can often offer an informed guess when it come to sinographs new to her but similar to one or more with which she is already familiar. But with your final sentence ("A person who does not already know the sinograph for a particular "word" will not be able to write it down"), I completely agree,
Bob said,
December 8, 2025 @ 11:41 pm
Thought-provoking post. It’s eye-opening to read about how the structure of Chinese characters made telegraphy and typewriting so difficult. It makes me appreciate how much writing systems shape the technologies and tools we use.