Sayable but not writable

« previous post | next post »

The distinguished Chinese linguist, Y. R. Chao, developed the concept of "sayable Chinese" and wrote a series of books illustrating what he meant by it. Basically, what Chao intended by "sayable Chinese" were texts that could be understood when read aloud.  This may sound like a somewhat ludicrous proposition for most languages where what is written on the page may be easily understood when read aloud slowly and clearly.  For Chinese, this is not the case, especially when texts are riddled with Classical terms, sentences, and whole passages that are divorced from spoken language.  But even parts of supposedly pure Mandarin texts may not be intelligible to someone who hears them read aloud, since the semantic carrying capacity of the morphosyllabic characters is greater than their sounds alone.  That is why, when people tell others their names or when someone is giving a lecture or reading a text, auditors will frequently ask the speaker to write down the intended characters for terms that cannot be understood merely by hearing.  Thus, there are many instances where things are writable in Chinese but not sayable.

In this post, I'd like to talk about the opposite situation, where things are sayable but not writable.  The subject arose in a striking fashion two days ago in my course on "Language, Script, and Society in China".  The class is composed of about 40 students, around half of whom are from Mainland China.  Of the latter, nearly all are graduate students, and several of them are quite advanced in Chinese language studies from some of China's best universities.  So these are not slouches or country bumpkins who understandably would have low levels of literacy.  Rather, these PRC graduate students represent the elite among China's humanities programs, hence they were selected for study abroad.

Here's how it happened.

I put the following colloquial expression up on the board:

lētè / lēte / lēde 肋脦[U+8126] ("slovenly; slipshod; untidy")

This is a sayable expression, but it is not very writable.  None of the students in the class could recognize both graphs and their correct pronunciation in this expression.  But when I read the term aloud, a couple of them knew what it meant.  The second character is particularly obscure, such that it doesn't exist in most computer fonts, but has to be specially called up by the Unicode number.

As I was talking about the meaning and orthography of lētè / lēte / lēde 肋脦[U+8126] ("slovenly; slipshod; untidy"), I told an anecdote about how some merchants from Sichuan, not knowing that I could speak Mandarin, once pointed to me and said that I was "lāta" ("slovenly; dirty; dowdy; sloppy; slobby; shaggy; unkempt; ill-groomed; sluttery; slipshod; untidy") — there are many other ways to write the second character, e.g., with the phonetic of the second character written as 沓.

At first I did not write the characters for lāta on the board, but simply spoke it aloud, and most of the students from Mainland China knew what it meant, and even some of the more advanced American students knew the word by sound too.  But not one single person in the classroom knew how to write the characters for this word.  Arguably the most learned student in the class (in terms of Chinese language and literature studies), a woman who is a visiting graduate student from Peking University and is finishing up her Ph.D. dissertation in the Chinese Department there, volunteered to go to the board and try, but it was obvious when she started to write the characters that she didn't have a clue — she even got the radicals wrong, using zú 足 ("foot") instead of chuò 辵 / 辶 — and the rest of the first character was such a mess that she erased what she had written and ran red-faced to her seat at the back of the room.

The judges of China's "spelling bees" (i.e., dictation contests) should ask the contestants to write lāta !

Hànyǔ dà cídiǎn 漢語大詞典 (Unabridged Dictionary of Sinitic), 6.1167b says that lētè / lēte / lēde 肋脦[U+8126] ("slovenly; slipshod; untidy") is "like lāta ", but I think that they are probably just different ways of writing the same disyllabic morpheme and may perhaps reflect topolectal variants (see below for more on that score).

It has been suggested that lāta comes from English "litter", but the constituent characters are already in old Chinese rhyme books like Jíyùn 集韻 and Guǎngyùn 廣韻, and the disyllabic term itself occurs in literary works from at least the Yuan (Mongol) period.

Lāta and lētè / lēte / lēde 肋脦 are examples of what are known in Chinese as liánmián cí 聯綿詞 / 連綿詞 ("rhyming / alliterative binoms").  A reconstruction of the old (Middle Sinitic) sounds of lāta would be something like *laptap.

Here's a list of the different types of liánmián cí 聯綿詞 / 連綿詞 ("rhyming / alliterative binoms").  And here's a thousand plus page dictionary with some 6,000 of them.

Nearly fifteen years ago, I made a case that lāta is related to lèsè / lājī 垃圾 ("garbage; refuse; waste; rubbish; trash"):

"On 'Transformationists' (bianjia) and 'Jumbled Transformations' (laza bian):  Two New Sources for the Study of 'Transformation Texts' (bianwen):  With an Appendix on the Phonotactics of the Sinographic Script and the Reconstruction of Old Sinitic."  In Alfredo Cadonna, ed., India, Tibet, China:  Genesis and Aspects of Traditional Narrative.  Orientalia Venetiana, VII.  Firenze:  Leo S. Olschki, 1999.  Pp. 3-70.

That article is available in this pdf, which is a rather large file, so download only if you're really interested in this subject.

I was also surprised that only a couple of the students from China had ever heard of, much less were able to write, the words gūlu 軲轆 ("wheel") and gūlù 轂轆 ("reel").  These are old colloquial terms that seem to have survived mostly in the oral realm and are related to some form of the Indo-European word for "cycle; wheel".  See Robert S. Bauer, "Sino-Tibetan *kolo 'Wheel'," Sino-Platonic Papers, 47 (Aug. 1994), 1-11. 

I always tell the students in my classes that the sounds of the words in Chinese languages are much more important than the characters that might be used to write them — even in Classical Chinese — where there are often variant written forms for the same term.  I demonstrated that for my students in the case of lāta ("slovenly; dirty; dowdy; sloppy; slobby; shaggy; unkempt; ill-groomed; sluttery; slipshod; untidy") by putting on the board more than two dozen different topolectal variants of this colloquial term. I read aloud the pronunciations of each of the variants and pointed out that the second and subsequent characters of these variants were mostly arbitrary transcriptions of the sounds of the local variants and that the surface signification of the characters used to write these syllables was essentially irrelevant.  The fact that many terms in Chinese — even in ancient texts — have a variety of different written forms, e.g., wěiyǐ 委迤 / wēiyí 委蛇 / wēiyí 逶迤 / etc. ("winding; meandering; twisting") confirms the primacy of sound over symbol.

Maybe I'll write a future post entitled "Writable but not sayable".  Suggestions welcome.

[Thanks to Richard Cook for helping me with the rare character 脦[U+8126]]



  1. AntC said,

    September 12, 2013 @ 9:21 pm

    Thank you again, Victor for a fascinating post.
    It's hard to imagine a language community even more orthographically challenged than English.
    more than two dozen different topolectal variants of this colloquial term, … Is this worse than the pre-sound shift variant/sounded spellings in Shakespearian English (before the printing press 'froze' spellings)? [I'm thinking of the recently posted video of David Crystal & son Ben -- apparently modern English speakers are missing out on big gobbets of the meaning.]
    Although homonymy and polysemy are ubiquitous, I kinda expect languages evolve to a balance point so that communication succeeds most of the time. Is disambiguating by writing terms down a more common tactic than providing a sematically equivalent term?: "I mean funny-peculiar, not funny-ha ha."

  2. David Moser said,

    September 13, 2013 @ 12:19 am

    Great post, Victor. I've found endless examples of this kind of very colloquial, everyday Beijing speech that many, if not most, Chinese don't know how to write. One example is the suffix "dehuang" 得慌, meaning "very, extremely." The other day someone commented that their backpack was "gedehuang" 硌得慌, "chaffing, uncomfortable", and none of the Chinese I was with could write 硌 or 慌. Also common words like 蹦儿 bengr, as in 蹦儿棒 "bengr bang", "amazing, awesome", often involve confusion as to the correct graph for "beng". It's often the most ordinary lexical items that are the hardest to remember. 颠儿了!David

  3. Noel Hunt said,

    September 13, 2013 @ 1:57 am

    I looked this compound (肋脦) up with the Japanese Google engine and came across the entry in 白水社 中国語辞典. This dictionary gives lē・de as the reading, with lē・te as an alternative reading. It also suggests this is 方言, and further notes that `formerly written as 褦襶 (nàidài)'. Is this true?

  4. Victor Mair said,

    September 13, 2013 @ 7:12 am

    The lists of terms for the Zhōngguó hànzì tīngxiě dàhuì 中国汉字听写大会 ("China conference on Chinese character dictation") are available here:

    Lāta 邋遢 ("slovenly; sloppy; unkempt"; etc.), one of the terms discussed at great length in this post, is #54 in the 6th episode (September 6, 2013). It was written correctly by the contestant.

    On the other hand, the contestant who drew xiēxíng wénzì 楔形文字 ("cuneiform script", #11 in the same episode) failed to write it correctly.

    Làiháma 癞蛤蟆 ("toad"), the word that caused so much consternation and created so many headlines when neither contestant nor members of the audience could write it correctly, is #17 in the first episode (August 2, 2013).

  5. Nora Castle said,

    September 13, 2013 @ 7:53 am

    Hi Professor Mair!
    In your post, you mentioned that sounds are more important than characters even in Classical Chinese. Classical Chinese, however, is generally read out loud using whatever topolect a person is most comfortable with. Classical Chinese is very different from more modern variants of various forms of Chinese language, and even when read it is not often immediately understood. Could a case then be made that characters in Classical Chinese historically were less important than sounds, but that now the characters are more important because Classical Chinese does not exist as a spoken language? Or do you feel that the fact that a specific pronunciation in whatever topolect used to read Classical Chinese may still map onto a large number of characters justifies saying that sounds are still more important than characters for Classical?

    I suppose basically what I'm asking if that if a language is no longer existent in an intelligible spoken form, can you really say that sounds are still more important, even if there exists a modern pronunciation equivalent (which may not be intelligible but is sayable)?

  6. Tom Gewecke said,

    September 13, 2013 @ 9:09 am

    I think U+8126 will be found in some font on all computers these days. The problem is that it may not appear right away in the list of hanzi that come up when using a standard pinyin input method. On my Mac it only appeared when I typed "te" instead of "de" and selected a radical/stroke listing option.

  7. Brendan said,

    September 13, 2013 @ 11:16 am

    @David Moser – Interesting! I've mostly seen the intensifier you wrote as "蹦儿 bengr" written as 倍儿 bèir, "doubly X," of which my all-time favorite usage is in the line "倍儿有面子" ("Talk about prestige!") in this scene from the end of the movie 大腕儿/Big Shot's Funeral. Striking that nobody could remember how to write 得慌, considering how common it is even in set phrases.

  8. J. W. Brewer said,

    September 13, 2013 @ 11:23 am

    A modest parallel (sayable but not writable) in English might be slang words that originally arise and circulate (maybe this doesn't happen anymore now that everyone communicates via texting etc) orally among teenagers or some other slang-generating subpopulation and end up being transcribed with various spellings without a consensus quickly emerging as to how to represent the word in writing (whether by those who themselves use the word orally or by fieldworkers trying to document How Kids Today are talking). An example might be the interjection dating back to at least the late '70's variously spelled even unto this day as either "sike" or "psych." (Meaning approximately – whatever I just said was not literally true; I was just telling you something untrue for my own amusement and/or to see how you'd react.) One spelling is phonetic; the other etymological – but you can learn and correctly use the word in this specific sense purely from observing the oral behavior of others without needing to understand the etymological derivation (from the phrasal verb "psych out"), and thus without the etymology necessarily being transparent to you, On the other hand, whichever spelling you prefer, assuming you know the relevant sense of the word to start with, you should if given sufficient context be able to decode the other spelling as a variant and thus understand what it means, which I suppose may not be the case with some of the hanzi above.

  9. C said,

    September 13, 2013 @ 12:07 pm

    Now I want to go ask my Taiwanese-American friends if they know these words! One of their mother's lives in town, so perhaps I will ask her.

  10. Victor Mair said,

    September 13, 2013 @ 12:23 pm

    Hi Nora,

    Thanks for your good questions.

    The reason why I stress the importance of sounds, even for Classical Chinese, which has been long dead (if, in fact, it ever fully lived), and despite the fact that one can read off the HANZI / KANJI / HANJA in Cantonese, MSM, Taiwanese, Japanese, Korean, or lots of other modern tongues, is because when the texts were first written down they had some sort of relationship to the spoken language of their time and place. In other words, people had a way of vocalizing them. So, paying due attention to phonological reconstruction — hypothetical though it may be — we can sometimes solve difficult philological problems by finding out what actual words lay behind the characters.

    Even in antiquity, authors often miswrote terms, borrowed characters for ideas, concepts, and things for which there were no fixed orthographical conventions, or just committed lapsi calami out of boredom, weariness, or sheer distraction, or perhaps due to possessing insufficient learning. Having a sense of how expressions are / were pronounced is frequently helpful, and in some cases essential, for making sense out of a text. That was certainly the case with the Dunhuang popular manuscripts that I worked on for the first twenty years of my Sinological career.

    I'll give you more examples in class.

  11. flow said,

    September 13, 2013 @ 2:19 pm

    @VM "Classical Chinese, which has been long dead (if, in fact, it ever fully lived)" … my feeling (as a layman in this field) is that–apart from the fact that there's hardly 'one Classical Chinese' language–the early recordings of Chinese as found on ceremonial vessels and other artifacts are really condensed versions of what the very people who produced them likely used in their daily speech. all writing is defective in the sense that not all the details of speech can be committed to writing; punctuation signs came about much later than the Roman alphabet itself, in Arabic and Hebrew, vowels are an optional, diacritic after-thought in the writing system; and in a logo/ideo/pictographic script, naturally, it is those abstract concepts and especially the 'function words' (虛詞) that are hardest to write. my guess is that there will have been a long period early on when written texts got read out in a manner not unlike (i believe) the Naxi script: you get all your important semantics anchors, the nouns, the adjectives and the full verbs, in the text, and its up to you to fill in the missing parts so listeners can understand you. this is not so different from a reader of Arabic, who has to know quite a lot about the language to be able to fill in the correct vowels, or an actor who has to put in a lot of expressivity in order to make those barren words come alive on stage. in Chinese, a terse 'telegram' style has become a cultural ideal, where 'one word' equals 'one syllable' equals 'one character' equals 'one idea'. it is a literary ideal, not necessarily the reality of any spoken idiom at any historical point in time.

  12. Rodger C said,

    September 13, 2013 @ 7:02 pm

    Uh, that's "lapsus calami" with long u. There's no "lapsi."

  13. Victor Mair said,

    September 13, 2013 @ 7:09 pm

    @Rodger C

    I wondered about that for a long time (it really didn't look or sound right to me!), having found a number of instances of "*lapsi calami" on the web before hesitantly writing it down. Mea culpa!

  14. Victor Mair said,

    September 13, 2013 @ 7:15 pm

    From Geoff Wade:

    "lāta" 邋遢 ("slovenly; dirty; dowdy; sloppy; slobby; shaggy; unkempt; ill-groomed; sluttery; slipshod; untidy")

    The Cantonese render this as lat tat 辣撻

    This seems to be the standard orthography in Cantonese


    lat tat mau 辣挞猫 = dirty cats

    年廿八 洗辣撻 – cleaning done on the 28th day of the 12th month before new year

  15. The suffocated said,

    September 14, 2013 @ 4:32 am

    @Prof. Mair

    "The Cantonese render this as lat tat 辣撻"

    Google returns 341,000 results for "年廿八洗邋遢" (with double quotes), but only 36,500 results for "年廿八洗辣撻". Dan Quayle also spelled the starchy, tuberous crop from the perennial Solanum tuberosum of the Nightshade family as 'potatoe', but that doesn't mean "the Americans spell the word as 'potatoe'", does it?

  16. Victor Mair said,

    September 14, 2013 @ 6:08 am

    @The suffocated

    I will let Dr. Wade speak for himself, but I should add that he originally included this additional note, which I left out because I didn't want to trouble people unnecessarily with a pdf:


    This seems to be the standard orthography in Cantonese

    p. 914 here:


    Dan Quayle speaks English.

    The Cantonese speak Cantonese, not "Chinese", if that's what you were thinking.

  17. Rodger C said,

    September 14, 2013 @ 11:05 am

    @Victor Mair: By the way, I really have no idea how you'd pronounce plural "lapsus" in Anglo-Latin. "Lap-sooze"?

  18. Mandy said,

    September 14, 2013 @ 9:52 pm

    @The suffocated

    Both 辣撻 and 邋遢 are very common written variations of "lata" in Hong Kong Cantonese. But 邋遢 definitely has a longer history and it is the standard way used by Hong Kong newspapers and by other formal channels. 辣撻 seems to be a more recent variant, perhaps because 邋遢 is more complicated and not so "intuitive" to write out?

  19. Daniel said,

    September 15, 2013 @ 2:31 am

    Do Cantonese speakers make a distinction in pronunciation between 邋遢 and 辣撻? According to dictionaries, the individual characters have the following pronunciations:
    辣 laat6
    撻 taat3
    邋 laap6, laat6, lip6
    遢 taap3, taat3
    The phonologically conservative pronunciation of 邋遢 would be "laap6taap3" (cf. the usual pronunciation of "rubbish" 垃圾, "laap6saap3"), but I don't know whether any Cantonese speakers actually pronounce it like this. (The innovative orthography 辣撻 makes sense only for the pronunciation "laat6taat3", and presumably would not have arisen as a way of writing "laap6taap3", no matter how difficult it might be to remember the "correct" characters for the latter.)

  20. Mandy said,

    September 15, 2013 @ 3:54 am


    No one in Hong Kong would pronounce 邋遢 as "laap6taap3". If one pronounces it like that, they might wonder whether one has a speech problem! To me, "laap" refers to other similar characters with the same tone, which is 蠟 (wax) and "taap" as in 塌 (to fall, collapse).

    撻 is a very "versatile" word, because there are several ways to pronounce it: 1) 傑撻撻 (sticky, messy situation), 2) 蛋撻 ("egg tart") and 撻車 (to start up a car), and 3) 撻訂 (to forfeit a deposit). I don't know the tonal marks for all these 撻, but they all sound different to a native speaker — except with (傑)撻撻, it can also be pronounced in the same tone as (蛋)撻.

    (邋)遢 shares the same tone with 撻(訂).

    I really think that 辣撻 is the "simplified version" of 邋遢. The likelihood for a Hong Kong person knowing how to write 辣撻 is much higher than 邋遢.

  21. Daniel said,

    September 15, 2013 @ 4:58 am

    @Mandy: Do these comments about the pronunciation of 邋遢 also apply to non-HK varieties of Cantonese?

    It would be interesting to know about the date and phonological explanation (if any) for the shift from "lap-tap" to "lat-tat". One speculative possibility: perhaps reduction of the cluster "p-t" to give "*lat-tap" was followed (accompanied?) by a matching change in the second syllable that restored the rhyming structure of the binome. (Are there any other similar examples of irregular rhyme-preserving pronunciation change in rhyming binomes?)

  22. Stephan Stiller said,

    September 15, 2013 @ 6:25 am

    The most common HK-Cantonese spelling is 邋遢; the word is pronounced (only) laat6 taat3.

  23. Stephan Stiller said,

    September 15, 2013 @ 6:43 am

    @ Daniel

    How do you come to the conclusion that the "phonologically conservative pronunciation of 邋遢 would be 'laap6 taap3'" [space added]?

    The dictionaries are unreliable and might mix different (sub)dialects, sometimes from different periods. And Cantonese spelling is often approximate, with plenty of phonetic borrowings. It's very difficult to reason about sound changes at a micro-level in this particular situation.

  24. Daniel said,

    September 15, 2013 @ 9:05 am

    @Stephan Stiller:
    My reference to the (hypothetical) pronunciation "laap6 taap3" as "phonologically conservative" was based on the reconstructed Middle Sinitic pronunciation ("something like *laptap"), combined with the fact that Middle Sinitic final -p usually (but not always) corresponds to final -p in modern Cantonese. Any strong claims about the phonological history of the word(s?) 邋遢/辣撻 would of course need to rely on much better data and more detailed analysis than this.

    Your claim that 邋遢 is "the most common HK-Cantonese spelling" contrasts with Geoff Wade's suggestion that 辣撻 "seems to be the standard orthography". Are you basing your claim on any particular corpus of HK-Cantonese written texts, and do you think the most common orthographies might vary depending on the genre of writing?

  25. Bob said,

    September 15, 2013 @ 10:57 am

    –back to the original subject, writable/sayable– Chinese had used separated writable and sayable language forms for more than 3,000 years, until the MAY 4 MOVEMENT of 1918 or 1919. However, most official writings were continued to be in the Classical Chinese form in the pre-1949 era.
    To most Chinese, there still are separate language forms, there are the writable , and the sayable –PRC and ROC in Taiwan pushed Putonghua/Mandarin as the official Chinese, thus making writable and sayable became united–, except in Guangdong and Hongkong.

  26. Bob said,

    September 15, 2013 @ 11:04 am

    rather than 邋遢 or 辣撻, in Hongkong 垃圾 is used mostly.

  27. Wentao said,

    September 15, 2013 @ 2:28 pm


    I believe even to this day, the writable and the sayable are hardly united to the vast majority of Chinese, no matter what Sinitic language they speak.

    垃圾 should be laap6 saap3, a different word to 邋遢/肋脦. Some dialects of Mandarin have le4 se (勒瑟?).

  28. Victor Mair said,

    September 15, 2013 @ 3:05 pm

    For a video comparing Cantonese and Mandarin, see this comment to an earlier LL post:

  29. Mandy said,

    September 15, 2013 @ 7:44 pm


    What you suggested is interesting (I’ve never thought about that), because all of the “cognates” of 邋(遢) are still pronounced as “laap”, such as 臘, 蠟, etc. I was thinking when 邋 wouldbe pronounced in its “phonologically conservative” sound (laap6), and there is one such instance: 邋雜 (“laap zaap” disarray, miscellaneous stuff, messy, etc). In this case, 邋 is never pronounced “laat.” I don’t know how to explain this sound difference between the identical character in 邋遢 vs. 邋雜.

    The “p-t” cluster in spoken Cantonese is not so common. Off the top of my head, I can only think of 塌塌米 “taap taap mai (tatami)” and 塔塔爾 “taap taap ji (Tatar)”, but perhaps they don’t really count because they aren’t “natural” Cantonese words. If I have to “force” it to happen, then maybe something like 入塔 “jap taap (to enter a tower)” but that also sounds contrived. I’m sure there are other examples, but I just can’t think of any right now. None of the above “taap” or “jap” should be pronounced as “taat” or “jat”.
    Here is another interesting observation. Like other written Cantonese words, there are also many ways to write out 邋雜, such as 立雜 and 臘雜. I was trying to find out which one is the “standard orthography”. I googled and found an old saying “乾淨冬至邋雜 (laat zaap)年,邋雜冬至乾淨年” (you might want to trace the origin of this old saying), but I also found other variants “乾淨冬至邋塌 (laap taap)年” and of course, “乾淨冬至邋遢 (laat taat)年”, but no reference to “乾淨冬至辣撻年”.

    If 邋遢、邋塌 and 邋雜 all represent the same thing, then perhaps it is reasonable to suggest that 垃圾 (laap zaap) is related to 邋遢, as VHM proposed.

  30. Bob said,

    September 15, 2013 @ 9:50 pm

    辣撻/烏遭/唔亁淨, are adjectives in spoken Cantonese, most Hongkong writings are in MSM form, thus 不潔 is used.
    垃圾 is the simplfied form of 邋遢, is a noun, used in Hongkong's MSM writings, but in Cantonese pronounciation.

  31. Bob said,

    September 15, 2013 @ 10:21 pm

    @wentao yes, MSM/Putonghua has replaced the Classical Chinese, as the writable Chinese form.
    For most Chinese, since the adaption of MSM/Putonghua as the educational form of Chinese, the writable and sayable form become one to a greater degree.
    Writing spoken Cantonese is a recent development, if one tries to do that in school, particular in Chinese Composition, one will get a FAIL mark! The common writable form in Hongkong is the MSM Chinese form.

  32. Daniel said,

    September 15, 2013 @ 11:12 pm

    @Mandy: After a bit of thought, my tentative suggestion of a p>t change due to juxtaposition in the consonant cluster seems unlikely, since this sort of assimilation is rare (perhaps non-existent) in most varieties of Chinese. Stephan Stiller is right to point out that "it's difficult to reason about sound changes at a micro-level in this situation".
    Still, it's fun to speculate. In the absence of an explanation for the p>t change, there is also the possibility is that Cantonese "laat6 taat3" is semantically similar but etymologically distinct from Middle Sinitic *laptap, and that Cantonese speakers borrowed the characters 邋遢 to write "laat6 taat3" without ever using the hypothetical "true" cognate "*laap6 taap3". (No doubt somebody with genuine expertise on Cantonese historical phonology can explain why this cannot be right.)

    @Bob: It isn't clear what you mean when you say that "垃圾 is the simplfied form of 邋遢". As Wentao has pointed out above, these are different words. Their meanings, although related, are distinct, and they have different pronunciations (垃圾 laap6 saap3; 邋遢 laat6 taat3).

  33. Jerome Chiu said,

    September 16, 2013 @ 2:57 am

    The 撻 in 杰撻撻 (sticky, messy situation) is taat9; in 蛋撻 ("egg tart") and 撻車 (to start up a car) it's taat7; in 撻訂 (to forfeit a deposit) it's taat8 – very versatile indeed! I totally agree with your observation that 邋遢 is the standard version used in books, periodicals and journalism, while 辣撻 is usually found written by the general public, e.g. in forums and chatrooms, etc.

    On laap9 tsaap9 – Huang Zhongze 黃仲則 (1749-1783) used 拉雜 in a couplet in his "Erotic Sentiments – 16 Poems" 〈綺懷十六首〉: 「歛袖成弦聲拉雜,隔窗摻碎鼓丁寧。」There is no doubt that 拉 here is not pronounced laai1 in Cantonese (but note that in Modern Mandarin it is pronounced lā in the sense here and the general sense of "pull" etc., while 邋 is also pronounced lā), for the position in which it finds itself requires an unflat tone; nor is there much doubt that it is pronounced laap9 here (rhyming with jaap9) because 丁 and 寧 also rhyme with each other. We may observe, in addition to possible regional differences (Huang was a native of Jiangsu), that (1) laap9 tsaap9 refers to the music played by the girl who was Huang's object of desire / longing, while nowadays it means (quite exclusively) "miscellaneous" or "messily miscellaneous" – a semantic change; and (2) one of 拉雜 and 邋雜 is a variant of each other (my wild guess is that 拉雜 predates 邋雜), perhaps due to the need to distinguish the two different shades of meaning mentioned above.

    I also concur with your hypothesis that "[the] reduction of the cluster "p-t" to give "*lat-tap" was followed (accompanied?) by a matching change in the second syllable that restored the rhyming structure of the binome."

    Dating this change will be extremely difficult. Common rhyme books and pronouncing dictionaries of the Cantonese language are, to this day, unduly prescriptive with a general disregard of historical phonology. One possibility is to search the corpus of Cantonese comic verse 打油詩 writers such as He Danru 何淡如, but I'm not exactly filled with confidence in the likelihood that this search will produce any results.

  34. Victor Mair said,

    September 16, 2013 @ 6:05 am

    From Bob Bauer:

    [VHM: A few special symbols have not displayed properly in this comment. If anyone is particularly interested in seeing what they are, you may request a pdf of this comment from Bob Bauer or me.]

    Problems in Writing Cantonese and Cantonese Colloquial Vocabulary

    When I saw the phrase "standard orthography in Cantonese" I involuntarily cringed. Strictly speaking, the written form of Cantonese has no standard orthography, since it has never undergone formal standardization as this term/process is generally understood in linguistics; so to me, the phrase is both too strong and inaccurate.

    Written Cantonese has evolved an ad hoc, consensus-based, informal orthography, and at the same time has managed to become codified to a helpful extent through the publication of Cantonese dictionaries, grammar books, periodicals, etc.

    Due to the lack of formal standardization, it is not unusual for one word to have two or more written forms, although one of them may be more commonly used than the other.

    At the same time, it is also not unusual for one Cantonese word to have two or more pronunciations; in the analysis of Cantonese vocabulary, I believe pronunciations should generally take precedence over written forms.

    Writing Cantonese is both a struggle and a challenge because of the disjunct/mismatch between the pronunciations of Cantonese words and the Cantonese reading pronunciations of the standard Chinese characters: these colloquial and literary layers can be regarded as forming two separate, distinct systems within the Cantonese syllabary.

    As for the written form of Cantonese laat6 taat3 'dirty; tricky, dishonest, underhanded', the case can be made that 辣撻 does indeed more accurately represent pronunciation, and this is the form that was transcribed (by hand) by Parker Po-fei Huang on page 414 in the Cantonese-English section of his dictionary published by Yale University Press in 1970. Writing 辣撻 laat6 taat3 is a good example of the common practice in written Cantonese of borrowing characters for their pronunciations to write semantically-unrelated words. However, the transcription of this word that is found in most Cantonese-Putonghua dictionaries is 邋遢 (and also 辣遢 in Chishima Eichi's Cantonese-Japanese dictionary published in 2005 by Toho Shoten) with the pronunciation romanized as laat6 taat3. This is in spite of the fact that the Cantonese reading pronunciations of the individual characters have been romanized variously as 邋 [laap6, laat3, lip6] and 遢 [taap3].

    In Hong Kong one occasionally hears claims from self-appointed language experts that a certain Cantonese word should be written with certain characters, and such a claim may have been traced to a presumed etymology in an ancient rime book, such as 《廣韻》 Gwong2 Wan6 or《集韻》Zaap6 Wan6.

    So, for example, according to the 《粵音正讀字彙》(2001) which has romanized the Cantonese pronunciations of the standard Chinese characters as called for by their original 反切 faan2 cit3 in Gwong2 Wan6, we find 邋 laap6 (p. 49) and 遢 taap3 (p. 50), that is with ending -p which was indicated by the rime character 盍 that was used in Gwong2 Wan6's 反切 for both 邋 and 遢; the rime of 盍 has been reconstructed by Wang Li as [ap] [VHM: the preceding "a" did not display correctly] (but complicating matters is the fact that 盍 is now pronounced/read as [hAp|2] [VHM: "A "= upside down "a"; "|" = | with a short bar extending horizontally from the middle of the left side] (= hap6) in Cantonese with short vowel).

    As for Cantonese 辣撻 laat6 taat3 'dirty’ and 擸/垃圾 laap6 saap3 'garbage, refuse', these are two separate, distinct words with two different meanings; I think the first word is likely cognate with Mandarin 肋脦 le1 te4 (DeFrancis, p. 542), and the second word with Mandarin 垃圾 la1 ji1 (DeFrancis, p. 527) and Taiwan Mandarin 垃圾 le4 se4 (DeFrancis, p. 542).

    As for 辣撻貓 laat6 taat3 maau1, there are two meanings:

    1. literal: dirty cat

    2. figurative: dirty, messy person, such as a child with a dirty face

    [VHM: I tried for over an hour to get the missing character in the penultimate paragraph to appear. I could get it to show up in the dashboard draft, but when I attempted to update the comment, the missing character would cause the rest of the comment to disappear along with it. Anyway, what I was trying to put there consists of a 扌 radical plus the phonophore of 㒎 or 䙣 -- there should be a horizontal line at the bottom. In "ideographic description sequence" (IDS) that would be ⿰扌䪞.]

  35. Stephan Stiller said,

    September 16, 2013 @ 6:51 am

    @ Daniel

    1a. The writing of a non-initial syllable is less reliable in such a situation; this is also mentioned in the post above. We haven't determined when and in what form laat6 taat3 appeared in Cantonese and how the spelling 邋遢 was established. Character identity does not equal cognate nature. If it entered as laap6 taap3, your theory is all good (despite the concern about cross-syllabic assimilation which you stated yourself), but there's still the reliance on an assumption. In any case, I very much welcome your contributions.

    1b. As for the lexicographic resources available: I'm not saying this is exactly where you looked, but the well-known CUHK database is for example not at all reliable when you want to know about older versus modern vernacular readings: the reading(s) appearing to be listed as colloquial or variant readings (the annotations there aren't really clear) may or may not be the present-day default reading(s).

    2a. I'm familiar with 邋遢 and recall seeing it with higher frequency than 辣撻. You are right to draw attention to genre (Jerome Chiu's comment supports this; his contribution is also welcome and valuable), but even there I'm not convinced that 辣撻 is more frequent. As for how to prove it to someone else: From a survey on Cantonese corpora at a recent conference, I got the impression that a lot of it consisted of older texts or spoken corpora transcribed by researchers according to their personal conventions (these corpora are not standardized in their spelling: this is a question I asked explicitly at least twice at the conference, and none of the corpus maintainers seemed to have any clue, with a vague acknowledgment along the lines of "oh yeah, would be useful"), so I'm not sure how easy it would be to give you a conclusive demonstration about modern spelling habits even with (certain) corpora. But you can search for eg the characteristically Cantonese "年廿八,洗邋遢" (see the search above by "The suffocated"), look in various lexicographic resources (they seem to universally prefer 邋遢, though Hutton & Bolton lists 辣撻 first, demonstrating that it exists as such in popular usage; they explicitly say they document spellings which some would consider "wrong", but then there are other problems with H&B, which I won't get into here), and do comparative searches on HK news sites and forums – this seems to confirm my subjective impressions, and other commenters have written about their impressions as well, though this is admittedly not scientific. Moreover I acknowledge as a confounding factor that 邋遢 exists in Mandarin too (but you can refine your searches by including 冇佢係 etc, while checking that 係 in the search results is the copula); normally this is a reason for an HKer to adopt the same spelling, though there are exceptions, and HK-local spelling is something to be careful about.

    2b. Geoff Wade seems to have guessed based on the spelling in the linked-to paper.

    Another thought: People know how to spell 臘 and 榻, so it seems like people ought to be able to deal with the 聯綿詞 邋遢, but of course Victor Mair has made the point that spelling is generally difficult in Chinese, and the spelling 辣撻 clearly exists.

  36. Daniel said,

    September 16, 2013 @ 8:22 am

    Great comments from Jerome Chiu, Bob Bauer and Stephan Stiller!
    A quick comment on Stephan Stiller's last point: it is interesting and perhaps surprising that characters can be difficult to remember even when they are closely related to a relatively familiar character by a mere change of radical. The infamous làihāma 癩蛤蟆 "toad" is another example – few Mandarin speakers have a problem writing yīlài 依賴 "to depend", but adding the disease radical to 賴 somehow seems to involve an extra cognitive burden disproportionately larger than the actual added complexity.

  37. Stephan Stiller said,

    September 16, 2013 @ 9:29 am

    @ Daniel

    I'm with you. I think the problem is the following: the characters in 邋遢 (laat6 taat3) as well as 癩蛤蟆 (làiháma) – all of them – occur in only so few lexical items each that their low lexical type frequencies lead to very low token frequencies, despite both words not being "rare" or outlandish.

  38. Daniel said,

    September 16, 2013 @ 10:25 am

    Some further thoughts…
    It is interesting that Jerome Chiu chimed in to support my initial tentative hypothesis after I had begun to doubt it myself. In the absence of better historical or dialect data, it might gain some plausibility if we could think of more clearly attested examples of either (1) consonant assimilation within a binome, or (2) coordinated sound change in a rhyming binome. Otherwise, the hypothesis would depend on postulating forms of phonetic change that aren’t known to occur in Chinese languages, and would thus deserve to be treated with considerable scepticism.

    One question about Bob Bauer’s comment that Cantonese laat6 taat3 “is likely cognate with Mandarin 肋脦 le1 te4” – if Mandarin lētè肋脦 is cognate with Mandarin lāta 邋遢, is there any particular reason to prefer one of these over the other as the “true” cognate of Cantonese laat6 taat3?

    Regarding Jerome Chiu’s mention of Huang Zhongze’s use of 拉 – I raised a question about this character a couple of months ago as part of an interesting discussion on the issue of Chinese tones through history. I was wondering whether the common Mandarin word lā “pull” might have been etymologically unrelated to the word originally represented by the character 拉 (which according to pre-Song written sources would have had final –p and meant something like “to break by twisting”, as seen in expressions like摧枯拉朽). The following response by JS is relevant to Huang Zhongze’s use of this character:

    [the] suggestion of two separate words 'break' (from final -p) and 'pull' (open syllable) seems most likely… I just noticed that Axel Schuessler has suggested, speculatively, that lā 'pull' could be an "archaic colloquialism" related to the word tuō 拖 'pull', this seemingly from an old lateral (GSR Companion, p. 214).

    The Cantonese pronunciation laai1 for 拉 would correspond to the open-syllable word “pull”, while the pronunciation laap9 in 拉雜 would correspond to the now largely forgotten word with final -p meaning “to break” (but used purely phonetically in the compound拉雜). It would be interesting to know whether Huang Zhongze associated the character 拉 with the pronunciation laap9 on the basis of rhyme books like Guangyun 廣韻, or whether he knew of this pronunciation because the older word with final -p had been preserved in his own native dialect or another dialect with which he was familiar. (I know little about Huang Zhongze — if he was from Jiangsu, might he have spoken a dialect with final -p/-t/-k all merged to final glottal stop? And to what extent might his choice of rhymes have be influenced by prescriptive rhyme tables derived from Guangyun etc.?)

  39. Wentao said,

    September 16, 2013 @ 11:43 am


    I'm surprised to see how many Mandarin speakers have difficulty writing 癞蛤蟆. Personally I find 癞 and 遢 extremely easy to remember, because they both have a straightforward radical-sound structure. 邋 is more difficult because I grew up in an environment of simplified characters, but once I learned 蠟 and 臘 it also becomes simple. Traditional characters that give me trouble are those extraordinarily fantastical ones such as 釁 or 鬱.

  40. The suffocated said,

    September 16, 2013 @ 12:34 pm

    @Prof. Mair

    Sorry, I didn't notice that it was Dr Wade's comment instead of yours. What I was trying to say is that Dr Wade seemed to claim that most Cantonese write 辣撻 for laat6 taat3, which is a bit off-base.

    Dr Wade took a video clip as an example, in which the subtitle for laat6 taat3 is 辣撻, but he didn't notice that in another video clip for the same song, 邋遢 is used instead. He also mentioned an academic paper in which laat6 taat3 is written as 辣撻, but the purpose of this paper is not to discuss what are the original characters for some Cantonese words.

    The ratio of "年廿八洗邋遢" to "年廿八洗辣撻" from Google search is about 10:1. If, as Stephan Stiller suggests in another comment, the Cantonese characters 冇 and 佢 are also added, the ratio even increases to 1600:1. Interestingly, if one looks at the contents of the search results, one will find that when 年廿八洗辣撻 is mentioned, people sometimes write 年廿八洗"辣撻". A pair of quotation marks is put around the word 辣撻, probably to indicate that the authors don't know the correct characters for laat6 taat3. In contrast, the same almost never occurs when people write 年廿八洗邋遢. This shows that people feel more confident about writing 邋遢 than writing 辣撻 for laat6 taat3.

    Surely, some people do write 辣撻, especially on internet forums and in texting, but for various reasons, loan characters are heavily used in casual communication in the first place. Many write 'C9' for 師奶, '5' for 唔 and 'on9' for a Cantonese word which I shall refrain from writing down here, but no one would say that "the Cantonese render 師奶 as 'C9'". Also, as Mandy has pointed out, when compared to 辣撻, the word 邋遢 definitely has a longer history and it is the standard orthography used by, for instance, Hong Kong media (examples 1, 2, 3, 4).

  41. JS said,

    September 17, 2013 @ 11:35 am

    Daniel's idea is nice; what we need are more examples of the _p-t_ intersection in binomes — or more generally in monomorphemic (or simply morphologically opaque) disyllables. The best I can do at the moment is 'butterfly', written 蛺蜨/蛺蝶 in older dictionaries (suggesting OC *kep-sep or *kep-lhep?) but more recently 蝴蝶, Cant. wu4-dip6>2 according to Cantodict. So this case doesn't illustrate place assimilation, but (if the forms really represent the "same" historical word) might support the idea that Chinese phonotactics have disfavored _p[+dental]_…

  42. Bathrobe said,

    September 19, 2013 @ 11:59 pm

    I don't think you have to go as far as 癞蛤蟆 to find a word that the Chinese have difficulty remembering. With the recent popularity of 雾霾, I think quite a few Chinese probably have problems writing the second character properly (especially that radical in the inferior part).

  43. Michael Rank said,

    September 21, 2013 @ 10:02 am

    I'm not an academic linguist but note that some of these highly erudite comments relate to Cantonese. Interesting to note that lēde 肋脦 and related expressions are listed in 北京土语词典 (北京,1990, 徐世荣 编)。I have scanned relevant pages if anyone is interested (see also previous two pages). I have a feeling that this is not a particularly Beijing word but is in this dictionary because it is considered a bit slangy???

  44. Victor Mair said,

    October 10, 2013 @ 10:29 am

    Speaking vs. writing

    From a native Pekingese student in my LSSC class:

    Last night Brendan shared some pictures from books about Pekingness through Wechat. I found it's so weird when I read them. I feel the subtle meanings of some words were weakened when they were written down. For example, 小力巴儿(伙计). Actually when I heard this word, I would feel a little humorous or disrespectful, according to context. But when I read it on book, no matter with context or not, I couldn't feel those potential emotions. I assume that may be due to the big gap between speech and writing system.
    Speaking of writing topolect, I remember the novel "海上花列传"(sing-song girls of Shanghai) in Last Qing dynasty was written with Shanghainese or Suzhounese. However, only the conversations part were written in topolect while the rest of it were written in Mandarin. I am not sure whether native speakers would have the same feeling when they read these conversations instead of hearing them.

  45. Victor Mair said,

    November 22, 2013 @ 12:00 pm

    From Wolfgang Behr:

    …tried to catch up with my backlog reading your nice LL posts and got lost somewhere in the comments of the Sep 12 post on 肋脦 and the instability of its orthographic shapes for the underlying word. If no one has pointed this out so far (?), the reason for that instability is pretty straightforward: lede is a Manchu loanword.

    陈刚, in his wonderful 北京方言词典 (商務 1985) has the following entries (p. 164):

    le1de 肋脦
    衣服不整洁,不俐落。|他这一身多~! 《 [满] lete

    le1debi1ng 肋脦兵

    le1decho4u 肋脦臭

    Checking the late Jerry Norman's _A comprehensive Manchu-English Dictionary_ (Harvard-Yenching Monograph, 2013, p. 250), one doesn't find an entry lete however, but

    lekde lakda:

    1.hanging down in shreds or rags

    2.hanging like fruit on a plant

    3.following closely behind

    The corresponding verb is clearly

    lekde(re)mbi: (-ke)
    to have an unkempt or dirty appearance, to hang in disarray, to be disheveled


    disheveled, shaggy

    In other words, the choice of a representation with a phonetic 力 in the transcription, which had -k in Middle Chinese and has -k or -glottal in several southern dialects, probably reflects that -k was still pronounced when the word entered Chinese from Manchu. Will have to check whether other Manchu dictionaries have forms without -k- sometime, but have no time for that now.

  46. Stephan Stiller said,

    November 22, 2013 @ 12:18 pm

    Wolfgang Behr's preceding comment reminds us of the need for true etymological dictionaries of Chinese.

RSS feed for comments on this post

Leave a Comment