In a comment to "Pinyin literature contest" (6/30/16), DG asked an excellent, reasonable question:

I am not a Chinese speaker, so I am wondering if the requirement that it's not originally written in Chinese characters is a sort of honor code, or is there some way to tell from the pinyin submission?

A composition in Pinyin is apt to be much more colloquial / vernacular / free / unconstrained than something written originally in characters.  The style will likely be quite different, in that whatever one writes in Pinyin must be intelligible to the ear when spoken, whereas writing in characters allows for (and promotes) sliding back into a bànwénbànbái 半文半白 (semi-literary semi-vernacular) mode, which is not always fully intelligible when read aloud.

Writing in Pinyin will enable authors to use all those expressions that are sayable in Mandarin but not writable in characters.  We have often pointed out on Language Log the absence of characters for Sinitic morphemes, even in Pekingese.  See, for example, "Kiss kiss / BER: Chinese photoshop victim" (7/22/14) and the essential references to other relevant posts therein.

See also "Pekingese put-downs" (11/7/13).

To encapsulate the difference between writing Sinitic languages with an alphabet and with characters, I quote this passage from "Trainspotting-like Voices in Chinese" (3/12/11):

Lǎo Shě 老舍 (real name Shū Qìngchūn 舒慶春, a Manchu of the Sumuru clan; February 3, 1899–August 24, 1966), who was renowned for his novels that contain a conspicuous amount of Pekingese terms, used to complain that it was impossible for him to write many of his favorite Pekingese expressions in Chinese characters.  The phonetic flexibility of alphabetic scripts, the ability to write down any sounds that are expressed in the speech of a particular language, is conspicuously absent in writing with characters, which is limited to a fixed (and generally rather limited) number of syllables.  We have recently encountered these obstacles in a couple of posts on Language Log:  "Surprising Transformations of a Beijing Street Name" and "Russian Loans in Northeast and Northwest Mandarin:  The Power of Script to Influence Pronunciation."

To sum up, we may cite the distinction between "sayable" and "writable" made by the distinguished linguist, Y. R. Chao (1892-1982), who practiced what he preached (he published several books of "sayable Chinese").  See "Sayable but not writable" (9/12/13).

The sponsors of the Li-ching Chang Memorial Pinyin Literature Contest (LCCMPLC) encourage authors to write whatever they can say in spoken Mandarin.  Perhaps, in future, we will have similar contests for other Sinitic topolects (Cantonese, Shanghainese, Taiwanese, etc.).


  1. Victor Mair said,

    July 1, 2016 @ 4:33 pm

    And don't forget "Duang" (3/1/15)!

  2. Michael Watts said,

    July 1, 2016 @ 5:02 pm

    A reasonable point.

    Slightly tangential to the topic of things that are sayable but not writable, I recently encountered someone making the claim that all chinese 方言 use the same grammar ("一模一样的", even!), just different pronunciation. I'd love to have some examples of 上海话 for which the characters are obvious and the transcription is clearly ungrammatical Mandarin. Sadly, I myself know not the first thing about any wu language.

  3. ohwilleke said,

    July 1, 2016 @ 5:15 pm

    Is it generally believed that the topolects derive from a common proto-language, or are there isolates or language families that are not genetically related to, e.g. Cantonese, among them?

  4. AntC said,

    July 2, 2016 @ 7:51 pm

    Thank you Victor. Something that repeatedly puzzles me in your many posts on the limitations of the character system:
    … writing with characters, which is limited to a fixed (and generally rather limited) number of syllables.

    The figure you usually give for a writer/reader to be competent is of the order of 10,000 characters — I think that applies for 'literary Sinitic'. Surely there can't be that many distinct syllables(?)

    1) of the 10,000 how many are homophones?
    2) how many syllables (of everyday speech) are there whose pronunciation does not correspond to a character?

    I understand from reading your posts that there might be reasons against using a particular character, even though its pronunciation matches the syllable. (A lot to do with the meaning of that character, sometimes to do with taboos around that character — for example that it's used to write the name of an Emperor.)

    I understand that with non-MSM dialects or topolects, the correspondence might not be close. But (for example) the sounds of the Latin alphabet have been adapted to many unrelated languages, perhaps with diacritics.

    It's not like the existence of topolects and the need to communicate in them by writing is a modern phenomenon.

  5. Michael Watts said,

    July 3, 2016 @ 1:02 am

    AntC, lists, by my manual count, 421 toneless syllables. Some potential syllables are missing by coincidence; for example, if pou is a legal syllable there's no reason bou shouldn't be. If you multiply by four to account for tones, that would yield 1684 potential syllables, of which some more are, for whatever reason, missing. (For example, the syllable nü exists, according to the ABC dictionary, only in tones 3 and 4. Then again, wikipedia records a syllable fiao which is unknown to the ABC dictionary, so perhaps there are some esoteric nǖ and nǘ characters out there.)

    So, the answer to the question "of 10,000 chinese characters, how many are homophones?" is necessarily "almost all of them". In the best case, you could have somewhere under 1680 characters with a unique pronunciation and over 8300 that are all pronounced identically to each other; in fact the distribution is much more even than that.

    I'm not so sure that the need to communicate in topolects by writing existed in the past. If you were sending a message to someone literate, they would pretty much by definition be trained in the written standard.

  6. Michael Watts said,

    July 3, 2016 @ 1:06 am

    Pinyin input says that the character 覅 is pronounced fiao. I've got to admit I'm curious now.

  7. AntC said,

    July 4, 2016 @ 7:31 pm

    Thanks @Michael.

    Another partial explanation [from Victor ao] is that some characters are polysyllabic. (There's a post just arrived on LL.)

    Nevertheless, we seem an order of magnitude out of kilter.

  8. Eidolon said,

    July 5, 2016 @ 5:42 pm

    "Is it generally believed that the topolects derive from a common proto-language, or are there isolates or language families that are not genetically related to, e.g. Cantonese, among them?"

    The former.

