Language Log

Chinese Telegraph Code (CTC)

May 24, 2015 @ 8:36 pm · Filed by Victor Mair under Changing times, Information technology, Language and computers, Writing systems

« previous post | next post »

Michael Rank has an interesting article on Scribd entitled "Chinese telegram, 1978" (5/22/2015).

It's about a 1978 telegram that he bought on eBay. Here's a photograph:

A preliminary note before providing the transcription and translation of the text: Chinese telegrams are sent and received purely as four digit codes. The sender has to convert a character text to numbers and the recipient has to convert the numbers back to characters in order to be able to read the message. I will describe the process in greater detail below.

The characters in blue on the telegram were written by the person who decoded the numbers.

Note that they consistently wrote chǎng 厂 / 廠 as what looks like a "T".

Here's what the telegram says (it's a typical business message; personal messages tended to be much shorter):

Yíshuǐ zhì gé chǎng gōngxiāo kē
wǒ chǎng xiàn yǒu ruǎnpí báiyóu èr dūn
duō zhǔnbèi fāchē yùn guì chǎng jīn lái
diàn xiàng guì chǎng qiúyuán zhū dài gé shǒu-
tào gé guì chǎng shìfǒu cún yǒu huò
yǐbiàn wǒ chǎng bèi kuǎn qǐng sù diàngào

沂水制革厂供销科
我厂现有软皮白油弍吨
多准备发车运贵厂今来
电向贵厂求援猪带革手
套革贵厂是否存有货
以便我厂备款请速电告

Notes: 字 (third charater from the right in the next to last line) is an error for 存 (字 is CTC 1316 while 存 is 1317). And ruǎnpí báiyóu 软皮白油 is a kind of softening oil for leather.

Michal L. Wright translates the telegram as follows (with some very minor changes):

Yishui Leather Factory Sales and Marketing Division

Our factory currently has over two tonnes of leather softening white oil just about ready to be sent to your factory by truck.
Today we are sending (this) telegram to your factory seeking help (regarding) pig(skin) belt leather and glove leather.
Does your factory have the goods? In order that my factory may prepare funds, please send a telegram right away to inform us.

The Chinese telegraph code consists of 10,000 four digit numbers from 0000 to 9999. Some telegraph operators could memorize hundreds and, in exceptional cases, a thousand or so of the numbers, but all the others had to be looked up, and that took a lot of time. It is relatively easy to look up the numbers at the receiving end, but at the sending end it requires analysis of the shape of the characters because they are arranged according to radical and residual strokes, by the four-corner system (N.B.: this is a totally different four digit identifier than that of the telegraph code; I learned it, but exceedingly few non-professionals ever did), or some other shape-based system.

I should mention that, in the century and more since the Chinese telegraphic code came into use (the first iterations were created by a Danish astronomer and a French customs officer in the early 1870s), there have been many different refinements and revisions, with a variety of arrangements and orderings.

When I first went to mainland China in 1981, every post office had a telegraphy section. I was utterly fascinated by how the operators worked, and I would spend hours observing them. I was astonished by how often they had to look up characters in their dog-eared manuals, and how frequently they had difficulty because they were unable to analyze the shape of the character correctly. Sometimes it would take several minutes or more to find a refractory character, and they often had to huddle by asking someone else for help. Since many of the smaller post offices only had a single operator on duty at a time, this meant that they would be stymied until someone who could look up the number of the character joined them.

After several years of watching telegraph operators in China, I never ceased to marvel at how monumentally inefficient a system it was. My old colleagues in Chinese language and script reform told me several times that, when Premier Zhou Enlai travelled, his biggest expense was telegraphy. I don't know if that is true or if it was an exaggeration, but I heard it from men like Zhou Youguang and Yin Binyong who were reliable sources of information about such matters pertaining to Chinese writing.

About twenty-five years ago, I was approached by international banking officials and law enforcement agencies who were forced to rely on the telegraph code to identify the characters of Chinese personal names. Individuals scattered across the globe from different topolectal backgrounds would romanize their names in the wildest possible assortment of completely nonstandard, ad hoc ways, but those in banking and law enforcement who were charged with an exact identification of the individuals with whom they were dealing told me they needed to know which characters were used to write the names, regardless of the romanizations. They asked me if there were any other alternatives to this method of using the telegraph code, because it was obviously giving them a heap of trouble. I advised them to hire people who were proficient in pinyin and arrange the telegraph code according to the sounds of the characters in pinyin because that would be the fastest and easiest way for them to look up the numbers. I don't know if they followed my advice or not.

Wm. C. Hannas, in Asia's Orthographic Dilemma, p. 313 recounts:

I once knew a man who because of his unusual profession had learned enough Standard Telegraphic Code to speak simple Chinese sentences in numbers. If you asked him, "Nǐ hǎo ma?" (how are you?), he would reply, "2053 1771 1170" or "0008 1170," depending on how he felt.

Similarly, I knew a distinguished Buddhist scholar, Edward Conze, whose language specialty was Pali, who would regularly refer to Chinese characters by their Mathews' Chinese-English Dictionary number. Conze probably had mastered several hundred characters in this fashion, and he always had a twinkle in his eye when he rattled off the numbers. I also knew a couple of Sogdian Buddhist specialists who employed the same method for referring to Chinese characters. I suspect that, among serious Buddhist scholars who didn't know Chinese, this was a common method for referring to specific characters when Mathews' dictionary was pretty much the universal standard for Anglophone sinology. Now that pinyin is widespread and it is easy to use it to look up characters in various electronic devices, I don't think anyone is memorizing Mathews' numbers any longer.

"The future of Chinese language learning is now" (4/5/14)

Chinese characters aren't as scary as they used to be before pinyin and computers, but they're still "damn hard", in the words of a well-known sage of Chinese language and script studies.

May 24, 2015 @ 8:36 pm · Filed by Victor Mair under Changing times, Information technology, Language and computers, Writing systems

Permalink

30 Comments

David Moser said,

May 25, 2015 @ 12:31 am

Fascinating, Victor! I remember the use of the Chinese telegraph code very well, and as I recall I even had to make use of it in submitting certain documents for my wife and I to get married in Beijing in 1994.

And Hannas' anecdote is amazing as well. Reminds me of the old joke about prisoners becoming so familiar with each other's jokes that they eventually just "told" them by shouting out a number. Or Christians referring to scriptures: "Matthew 5:19! Luke 3:21!"

It seems to me that computers have saved the life of Chinese characters twice now; Once in 1990s, when Chinese word processing became possible, and again now in the 2000s, as new speech-to-text technology has made the writing-by-hand process increasingly unessential. I think massive digraphia and resorting to English or pinyin would have eventually resulted if not for these technological advancements.

We'll see what happens in the next decade…
Ralph Hickok said,

May 25, 2015 @ 7:34 am

Of course, the punchline to the joke about the prisoners is that one of them doesn't laugh at a "number" and, when he's asked why, he replies that he heard it before.
shubert said,

May 25, 2015 @ 7:37 am

Old Chinese typewriter requires the same. In telephone, One of operators is reported to memorize the phone numbers of, maybe whole city!
shubert said,

May 25, 2015 @ 7:47 am

@David–use Pinyin is double sided: making it easier and harder.
Jim said,

May 25, 2015 @ 7:58 am

Ralph Hickok,

I've heard the prisoner joke with a different ending. The new prisoner learns the numbers, the tries them as he's seen veterans do, and gets no laughs. He asks and they say, "Some people can tell jokes, some people can't."
Eric P Smith said,

May 25, 2015 @ 8:19 am

Prisoners, jokes and numbers: an alternative punch line is that none of the prisoners laugh, and they explain to the teller of the joke that it wasn't funny because of the way he told it.
Lars said,

May 25, 2015 @ 10:20 am

The Danish version of the joke number joke:

A man is invited to join the local dinner society (this is Denmark, we have societies for everything), and told that they use numbers for jokes, as above. After listening for a while, hearing polite laughter when the established members call out well-known numbers, he takes a gamble, gets up and calls out "417!"

Upon which, the whole room breaks out in belly-bursting laughter follows, several members falling off their chairs. When after several minutes order is restored, he asks the person beside him what happened.

"Oh, we'd never heard that one before!"
Eric P Smith said,

May 25, 2015 @ 10:38 am

Sorry Jim, I can't fathom the timing of these pages: I swear your comment wasn't displayed on my computer when I posted mine.
K. Chang said,

May 25, 2015 @ 11:43 am

For trekkers, "Darmok and Jalad at Tanagra" would be the rough equivalent of the numbers joke.

https://en.wikipedia.org/wiki/Darmok

I remember when I first got my TwinBridge Chinese entry system for my win 98, it had many different IMEs, and one of them was telegraph code. It was quite fascinating, but the "stroke deconstruction" method was still popular today in the many traditional Chinese based IMEs like Cangjie, and allegedly experts can achieve more than 100 cps with Cangjie. A similar method, Wubi (five penstrokes) also gained some prominence. And a recent development called Boshiamy looks interesting, also stroke / radical based, but is simpler and faster than Cangjie, but the owners charges for it, so it hadn't become very popular.
David Marjanović said,

May 25, 2015 @ 1:51 pm

more than 100 cps

100 characters per what? Surely not second?
Victor Mair said,

May 25, 2015 @ 2:21 pm

@David Marjanović

Indeed! And surely not even per minute.

K. Chang says "allegedly", but I've been testing these claims for shape-based input systems over the years (since the early 80s), and they are all hyped, even with prepared texts. With previously unseen, unprepared, unpracticed texts, they don't get anywhere near 100 cpm.
K. Chang said,

May 25, 2015 @ 3:17 pm

Obviously it wouldn't be sustained speed, but it's conceivable if they enter commonly used phrases with some proper predictions or shortcuts burst speed can reach 100 characters per second, so say Wikipedia (haha)

That new IME boshiamy claimed burst speed of over 200 cost, but it has shortcuts (read predefined macros) too. My written Chinese is too poor to try the radical /stroke systems though.
K. Chang said,

May 25, 2015 @ 3:18 pm

Obviously it wouldn't be sustained speed, but it's conceivable if they enter commonly used phrases with some proper predictions or shortcuts burst speed can reach 100 characters per minute.? so say Wikipedia (haha) could be my reading comprehension

That new IME boshiamy claimed burst speed of over 200 cost, but it has shortcuts (read predefined macros) too. My written Chinese is too poor to try the radical /stroke systems though.
K. Chang said,

May 25, 2015 @ 3:19 pm

Could be my reading comprehension. It probably is per minute
Ray said,

May 25, 2015 @ 3:49 pm

this is so fascinating! it reminds me (in a strange reversed way) of those secret codes we used as kids, where each letter of the alphabet was assigned to a square on a tic-tac-toe grid, and the bounding walls of each letter, comprising a fragment of the grid, became the cypher for that letter — it's called "pigpen cypher"

http://en.wikipedia.org/wiki/Pigpen_cipher

and a fellow named c. c. elian took it a step further, and developed a more calligraphic way of writing this pigpen cypher, which looks very asian…

http://www.ccelian.com/ElianScriptFull.html
peterv said,

May 25, 2015 @ 4:00 pm

On human memory: I know former telephone switchboard operators able to match hundreds of telephone numbers to the voices of their owners, and to recognize people's numbers when hearing their voices in public places.
maidhc said,

May 25, 2015 @ 5:47 pm

In many US grocery stores, every kind of vegetable has a numerical code. Most veteran checkers have them all memorized. They just enter the code and throw the stuff on the scale.

But if you get a new checker and you're buying something a little unusual like parsnips or baby bok choy, they may just hold it up and ask the next checker what's the code for this? Of course they have a list, but it's quicker to ask.
maidhc said,

May 25, 2015 @ 5:56 pm

I remember in the 1980s you could buy a ROM that would map a numerical code to a graphical representation of Japanese characters. One of the big Japanese semiconductor companies used to make them. The idea was that you could copy the character into a 640×480 graphical display.

Unfortunately I remember neither the number of characters, the numerical code used, nor the size of the representation.
Sean Manning said,

May 26, 2015 @ 2:51 am

The equivalents for Assyriologists are Labatt numbers and Borger names/numbers and Unicode code points, although I do not think that anyone tries to memorize the later. Fortunately there are hundreds not tens of thousands of cuneiform signs, and someone who just wanted to write Akkadian phoenetically could get by with a hundred or so. But many books introduce their own system of numbering the set of signs which they use …
Mal said,

May 26, 2015 @ 3:27 am

The Danish version of the joke number joke:

A man is invited to join the local dinner society (this is Denmark, we have societies for everything), and told that they use numbers for jokes, as above. After listening for a while, hearing polite laughter when the established members call out well-known numbers, he takes a gamble, gets up and calls out "417!"[…]

I knew the alternative ending of:

'Shocked gasps fill the room, and the man next to him grabbed him by the arm and ushered him outside saying "How dare you speak such filth when there are ladies present!"'
K. Chang said,

May 26, 2015 @ 4:51 am

@Ray — you mentioning the pigpen cypher reminded me of the "tap code"

https://en.wikipedia.org/wiki/Tap_code

Which is a way to use tapping two numbers and designate a letter with a polybius square (5×5)
ajay said,

May 26, 2015 @ 5:57 am

I know former telephone switchboard operators able to match hundreds of telephone numbers to the voices of their owners, and to recognize people's numbers when hearing their voices in public places.

Even more remarkably, radio intercept operators with the British Y Service were able to identify many (I don't know about hundreds) of enemy radio operators by their Morse style, or "fist"…
ajay said,

May 26, 2015 @ 5:59 am

On the telegraph code, can someone explain why the very non-intuitive pure number code was used, rather than simply the syllable combined with a number indicating tone? So just send "shu4" or "hao2". Just as quick in terms of Morse characters and a lot easier to learn.
K. Chang said,

May 26, 2015 @ 9:55 am

@ajay — because telegraph code predated pinyin by about eighty years. First telegraph code was invented for Chinese in 1871, and pinyin was invented in the 1950's.

https://en.wikipedia.org/wiki/Chinese_telegraph_code
https://en.wikipedia.org/wiki/Pinyin
Mike Wright said,

May 26, 2015 @ 11:56 am

Back in 1967-72, the most complete Chinese-English dictionary available to me was a the Modern Chinese-English Technical and General Dictionary, from the U.S. National Science Foundation, published by McGraw-Hill in 1963.

There were three volumes. The main volume that I used was arranged by pinyin and used CSTC for the characters. On Google I found that it "Includes 212,000 entries, of which 80% are scientific and technical". It was extremely dense — maybe 9- or 10-point type– but still about 3 inches thick. Nothing else came close to it for finding obscure terms.

Even without memorizing a lot of CSTC codes, one could get pretty good at recognizing the likely radical range of a code. Even 43 years later, I still remember a few: 0001, 0008, 2052, 2053, and 2508. I knew dozens back then.

My CSTC book, published in 1983, includes a pinyin-based section for encoding and a number-based section for decoding.

Regarding ajay's question about why pinyin wasn't used for telegrams, I can think of a few reasons:

1. Knowing the standard pronunciation of an intended character wouldn't necessarily give you the character, even in context. There's lots of room for misunderstanding. Organizing the text into "words" might help, but that doesn't seem to be something that is intuitive even to native Mandarin speakers. Scan through DeFrancis' ABC Dictionary and you'll find lots of homophonic compounds that could provide confusion. There are lots of monosyllabic words in spoken Mandarin, and even more in written Chinese.

2. Even highly educated Chinese didn't necessarily have "standard" Mandarin pronunciation. In the mid-70s, I worked with a Chinese gentleman who had the Chinese equivalent of a PhD. When I would ask for help with a difficult passage, he would read it aloud in a totally incomprehensible Zhejiang accent. I doubt that he could have reliably transcribed character text into pinyin.

3. I suspect that even by the '70s, there were lots of Chinese who didn't speak Mandarin in their daily lives, but could still read a bit. This was certainly the case in Taiwan (and still by the time of my last visit in 2000), so I bet that it was true in the PRC as well. And, of course, Cantonese speakers could have been literate in some variety of written Cantonese. (This was true of Min speakers at one time, though there apparently wasn't as unified a system as exists for Cantonese.)

4. CSTC could be used to transmit messages by voice. When I was living in Taiwan, I was listening to short wave radio at a friend's home, and we ran across such messages. We were able to transcribe and decode a little bit. One advantage to CSTC for voice messages is that 10 digits are much less likely to be misunderstood in a noisy environment than all the possible syllables of Mandarin. In voice CSTC, the equivalent of our "alpha, bravo, charlie, delta, echo, …" is "dong yao liang san si wu liu guai ba gou" for 0 through 9, providing fewer chances for misunderstanding than "ling yi er san si wu liu qi ba jiu".

Even though Michael's telegram does contain an error, it was obvious, and Madeline Chu was able to correct it.

(Sorry this got a little long. Sometimes I just don't know when to quit.)
Dave Cragin said,

May 26, 2015 @ 9:35 pm

Lars – Thanks for the Danish. Jokes often don't translate well, but that one did so perfectly.

For another perspective of on Speaking Chinese in Numbers, here's a wonderfully entertaining modern look at this:
https://www.youtube.com/watch?v=DFTW_abinnM
(this may have been posted previously at this blog?)

I find the body language & vocal intonation of the speakers at Off the Great Wall coupled with the content they offer makes them very engaging.
ajay said,

May 27, 2015 @ 8:43 am

Mike Wright, K. Chang: thanks very much.
brandon seah said,

May 28, 2015 @ 3:58 pm

Future Edward Conzes might rattle off Unicode numbers instead…
Jean-Michel said,

May 30, 2015 @ 2:52 pm

I notice the decoder used 弍 as the formal/banking form of 二 "two," instead of the standard mainland form 贰. At first I thought the decoder was just saving time by writing an informal simplification in place of 贰 (which was itself simplified from 貳), but I looked it up and it turns out 1708 actually corresponds to 弍 specifically–贰 is 6310. 贰 was further simplified to 弍 in the second-round simplifications announced in 1977 (but never consistently implemented), which might explain why it was used here instead of 6310 "贰." But 1708 also corresponds to 弍 in the Taiwanese telegraphic code, meaning that this is presumably a pre-1949 assignment and they used one of the 10,000 possible code points for a variant that apparently had no official status until 1977.
Andrew said,

June 6, 2015 @ 1:15 am

I was recently surprised to find out that using a numerical code mapped to a dictionary was actually Samuel Morse's original idea for how to send English over the telegraph, though it was obviously replaced with the more familiar code which translates directly to letters before the telegraph was more than a curiosity.

http://www.morrisparks.net/speedwell/tel/tel.html

There were also plenty of telegraph codes which would replace long commercial stock phrases with a shorter uncommon word to shorten the message and obfuscate it to some degree.

RSS feed for comments on this post

Chinese Telegraph Code (CTC)

30 Comments

David Moser said,

Ralph Hickok said,

shubert said,

shubert said,

Jim said,

Eric P Smith said,

Lars said,

Eric P Smith said,

K. Chang said,

David Marjanović said,

Victor Mair said,

K. Chang said,

K. Chang said,

K. Chang said,

Ray said,

peterv said,

maidhc said,

maidhc said,

Sean Manning said,

Mal said,

K. Chang said,

ajay said,

ajay said,

K. Chang said,

Mike Wright said,

Dave Cragin said,

ajay said,

brandon seah said,

Jean-Michel said,

Andrew said,

Follow us on Twitter

Archives [+/–]

Blogroll [+/–]

Meta