Identifying written Cantonese

« previous post | next post »

A query by a commenter on Victor's post raises an issue that seems worthy of discussion here on the main page. The question is whether it is possible to distinguish written Mandarin from written Cantonese. A widely believed myth is that even forms of Chinese that are mutually incomprehensible in their spoken forms are identical in writing. This is not true. Victor's post itself points out small differences between written Taiwanese Mandarin and Mainland Mandarin. Written Cantonese can in fact be distinguished from written Mandarin.

Much of the time "written Cantonese" is "Mandarin written by a Cantonese speaker" and so can only be distinguished from native Mandarin by relatively subtle cues. However, if what is written is truly Cantonese, it is easily distinguished from Mandarin if it is of any length because some common words are not cognate and are therefore written with different characters.

For example, in Mandarin, the word meaning "he, she, it" is pronounced ta¹ and is written with the characters 他, 她, and 牠 respectively. Historically this is a single word written with a single character; the gender distinction, which is made only in writing, is a fairly recent innovation. The Cantonese word for "he,she" is keui⁵, which is not cognate to ta¹ and is written 佢. In contrast to Mandarin ta¹, keui⁵ cannot refer to inanimates. A feminine form 姖 is occasionally found, but as in Mandarin the gender distinction exists only in writing. If a text uses 佢 or 姖, you can be sure that it is in Cantonese. (Note that you cannot draw this conclusion if the text merely mentions 佢 or 姖. For example, this post is in English, not Cantonese.)

Like many myths, the myth that the many forms of Chinese are identical in writing is false but has a kernel of truth. That kernel of truth is that someone who can read one form of Chinese has a fairly easy time learning to read another. For one thing, the written form erases the numerous differences in pronunciation. For another, until fairly recently there was a more-or-less standard written form not identical to any spoken dialect. Just as Europeans who spoke various languages could communicate in Latin, so until recently all literate Chinese people could communicate in the somewhat artificial written standard. With the shift over the last century to a written language that is closer to the colloquial, the differences among dialects have increased. Even so, people whose native language is something other than Mandarin often write in what is more-or-less written Mandarin, not the written version of their own dialect. Indeed, some dialects don't really have a written form, and even those that do typically do not have characters for all of the words that are not cognate to the standard Chinese word.


  1. Otter said,

    August 30, 2008 @ 7:18 pm

    That's a pretty large kernel you've outlined there. That there are a few characters which refer specifically to the Cantonese "words" is undeniable, but these are hardly used in ordinary language. (A google search for 佢 yields mostly results for 渠, which is an unrelated character pronounced similarly). 佢 is qu2 in Mandarin, which is hardly cognate with ta1 either…

    By the by, it is a little misleading to give 牠 as the Mandarin form for "it", when it refers only to animals. The modern character 它 is much more common and refers to all non-people. (牠 is almost never used on the Chinese mainland).

  2. Jim Ancona said,

    August 30, 2008 @ 8:23 pm

    So William Gibson had it right

  3. James said,

    August 31, 2008 @ 12:53 am

    Oh, thank you. That claim never made sense to me. I mean, how could it possibly be true?

  4. Eric said,

    August 31, 2008 @ 1:24 am

    Much of the time "written Cantonese" is "Mandarin written by a Cantonese speaker" and so can only be distinguished from native Mandarin by relatively subtle cues.
    If the number of times Taiwanese AND Mainland visitors have complained to me that "Hong Kongers don't know how to write proper Chinese" is anything to go by, the cues certainly aren't subtle to native speakers.

    Some years ago,Isaw a whole book listing these HK grammatical/lexical shibboleths — mainly calques from English — which admonished readers for using them, and attempted to instruct them in the proper mainland-style usage. Sadly I didn't buy it and can't recall for the life of me what the title is. Now the only phrase I remember out of the whole list is "前線職員" …

  5. Michael Wise said,

    August 31, 2008 @ 1:42 am

    I've known Chinese native speakers who could suss out written Japanese. Apparently it's not that hard: the characters mean the same despite their different pronunciations. The semiotic component of the Chinese characters is much stronger than the phonetic component, but contrary to popular belief, there is a phonetic component in written Chinese, and eventually the Chinese languages will diverge to the point their written languages are as incomprehensible as their spoken languages.

  6. Kevin Iga said,

    August 31, 2008 @ 2:57 am

    An even more common distinguishing feature of written Cantonese is the negative marker mh6:唔 while in written Mandarin it is bu4:不. It should be noted that while Cantonese has a pronunciation of 不,which it uses when reading more formal written text (which is usually an attempt at written Mandarin), in actual speech, 唔 is used in Cantonese pretty much where Mandarin would use 不. Same goes with the verb "to be": Cantonese haih6 (sorry; can't find the character–looks like 口系 merged together), and Mandarin shi4 是.

    There are also various cases of word choice that are only slightly more subtle: for "to speak", Mandarin tends to use shuo1 說 while Cantonese uses gong3 講 (M: jiang), and so on.

    Then there's the fact that Mainland China adopted the simplified characters, while Hong Kong (the predominant source nowadays of written Cantonese) continues to use traditional characters. Then again, Taiwan also uses traditional characters, and Mandarin is predominant there.

    There's also some foreign symbols that sometimes appear in written Cantonese, such as the Roman letter D, for "di1", the comparative suffix for adjectives, and の, from Japanese "no", indicating possesive, even though Chinese 的 (C: dik1, M: de) is already available. These may be a matter of current popular fashion, though, like use of "88" in text messages to mimic English "bye bye" with the Mandarin "ba1 ba1".

    As for Japanese, it's true that many Chinese speakers can figure out the meanings of written Japanese samples. Partly also aiding this is that there's also the matter of borrowings from Chinese, and the tendency in Japanese to use Chinese characters for nouns, verbs, etc., and use the kana syllabary for functional words like particles and tense. But nowadays this is getting harder for Chinese speakers, as Japanese continues to borrow terms from English, which are written in the katakana syllabary.

    Furthermore, Japanese simplified its Chinese character set around WWII, and the Chinese "simplified" characters from a decade later are often simplified even further. That's not likely to trip up Chinese speakers too much. But there are also situations where Japanese will use a completely different character than Chinese will: America is 美國 in Chinese but 米國 in Japanese (for formal and compound word purposes; otherwise it is "amerika" transliterated into katakana: アメリカ).

  7. Andy J said,

    August 31, 2008 @ 7:20 am

    There is a risk I'm going seriously off-topic, but in the presence of people who are clearly knowledgeable about the written forms of Chinese, it is too tempting an opportunity to forego.
    Is the number of characters fixed or can any writer (for example) create a neologism with a single (new) character or can this only be done with what I believe are called meaning-meaning compounds (in either Mandarin of Cantonese)? If so, how would such characters be promulgated? For example in another post I inadvertently created the word ultruistic. Any English speaker could readily pronounce this (presumably) unknown word, but might have difficulty, out of context, working out what I meant by it. With Chinese characters there seems to be a two-fold problem – meaning and phonetic – if a writer wished to introduce a new concept. Or have I approached the idea of neologisms in Chinese incorrectly? As a supplementary, is there such a thing as a ‘spelling mistake’ in Chinese, ie a hand-written character so badly formed that its meaning cannot be accurately determined?

  8. steve514 said,

    August 31, 2008 @ 9:46 am

    Andy J,
    Of course characters have to be invented somehow, and new ones do get introduced, but you're right, it's not an easy process, and I imagine it's gotten much much harder now that 'written' often means 'typed' rather than hand written. If the character is not in your input program's dictionary, then you can't write it (even if you're dead certain it exists and will be unquestioned by your target audience).
    This is not really an issue for anyone who feels the need to coin a new term when writing in Chinese. Rather than creating a new character, all that's required is a new compound of existing characters The vast majority of useful 'words' are two character compounds, and longer compounds are also pretty common. New characters are super rare and probably getting even rarer, but new compounds pop up all the time.
    Actually your question is not so far off topic… today's written Cantonese, as a very recent offspring of ye olde classical written Chinese, has been a hotbed of new character creation, so if you're interested in looking at how a character system can change and grow written Cantonese would be a great place to start. A lot of the Cantonese only characters are used in historical chinese texts but not modern mandarin, so they're not really newly invented characters even though in practice they're particular to a historically recent writing system. But a lot of 'cantonese characters' have been invented out of nothing to reflect colloquial cantonese. I belive ‘有 with the two horizontal lines missing' is one example of this. Sorry I couldn't type the character I mean… it's not in this computer's input system!

    PS Spelling mistakes in Chinese are easy to commit. Some poorly formed characters may morph into other legal but inappropriate characters, (spot the difference between 土 and 士!) or you can stuff up a radical, or substitute a similar sounding but inappropriate character. (cuobiezi)

  9. D. Wilson said,

    August 31, 2008 @ 11:38 am

    I think this character (mentioned above) (‘有 with the two horizontal lines missing') — 冇 — can be used to identify Cantonese at a glance (although I'm told it's used in Hakka writing too). [I cut-and-pasted it from the Wiki article on "written Cantonese". Will it appear correctly?] It's a very common character in Cantonese, and it catches the eye (mine anyway) even in tiny or low-resolution text.

  10. Guan Yang said,

    August 31, 2008 @ 8:37 pm

    I've noticed that all the Chinese names for chemical elements are a single character (at least up to nr. 111, Roentgenium 錀). Does anyone know if any of these characters have been invented in modern times, after the discovery of those elements, or if they all existed before?

  11. Pipe Dreamer said,

    September 1, 2008 @ 1:50 am

    All of the chemical elements excluding the ones known since classical times (iron, gold, etc.) are newly-invented characters. In fact, the chemical elements represent the biggest group of characters invented in modern times.

  12. Escribir en chino said,

    September 1, 2008 @ 3:55 am

    […] Language Log

  13. Nigel Greenwood said,

    September 1, 2008 @ 5:28 am

    Re Chinese speakers sussing out the meaning of Japanese text. This is true only up to a point — in much the same way that a monoglot English speaker might be able to get the gist of a newspaper article in Portuguese, based on the common Latin-based elements in both vocabularies.


  14. Nigel Greenwood said,

    September 1, 2008 @ 5:32 am

    Re Guan Yang's query: Most are recent inventions. See:


  15. Nigel Greenwood said,

    September 1, 2008 @ 5:33 am

    I've tried 3 times to submit a posting in reply to Guan Yang's query. No success!


  16. Nigel Greenwood said,

    September 1, 2008 @ 7:03 am

    The problem may be that I'm trying to post a link. Look up "Chemical elements in East Asian languages" in Wikipedia.


  17. mondain said,

    September 1, 2008 @ 9:42 am

    "渠" as the third person pronoun can be found in classical Chinese.

  18. Chas Belov said,

    September 1, 2008 @ 4:42 pm

    As far as new characters go, I think one might have been invented for the hip-hop group Softhard's album Broadcast Drive Fans Murder, in the title of the safe-sex song Dimgaai Yiu Daaiga Lup (literally, Why need everyone "lup", although the English song title appearing on the album was "Bring Your Own Bag," probably a condom reference) "Lup", according to a South China Morning Post article of the time, meant "Love and Protection". The album booklet itself was visually incredible, with all the lyrics as calligraphy in graffiti style. It's turning up in Google as 點解要大家笠 (where the online Chinese Character dictionary says 笠 means bamboo hat) but I can't say for sure whether that character appeared in the actual album booklet or a similar invented character.

    Anyway, even if 佢 is not sufficient by itself, 佢哋 or 佢地 as a phrase is sufficient.

  19. Andy J said,

    September 1, 2008 @ 5:31 pm

    Many thanks to Steve514, Nigel Greenwood and Chas Belov for trying to clarify things for me. Having read through several Wikipedia articles on the subject I'm probably more confused now than before!

  20. Nigel Greenwood said,

    September 1, 2008 @ 6:25 pm

    Andy J, In answer to your question about Chinese neologisms, the characters invented for the newer chemical elements are certainly one class of example. As you may have seen on Wikipedia, the character lǚ for (A)lu(min[i]um) combines the "metal" radical with an already existing phonetic lǚ. Anyone seeing it for the first time would be able to hazard a guess that it denotes a mineral sounding something like — in fact exactly like — "lǚ".

    At a more playful level, the great linguist YR Chao invented a number of characters in his translation of Lewis Carroll's poem The Jabberwocky. Luckily these are explained to Alice by Humpty Dumpty (as they are in the original).


  21. Kevin Iga said,

    September 1, 2008 @ 10:12 pm

    Back in 2006 I was in Hong Kong and their MTR (subway system) had lots of advertisements with new characters. Short of being able to post a picture here, I posted it on my (otherwise defunct) blog:
    (or click on my name above)

    My Hong Kong native informant tells me that the character in question is a neologism introduced by the popular comedy "Shaolin Soccer" where a Kung Fu group gets together to fight evil and win soccer tournaments. There were a number of weird neologisms in the movie, and this is supposedly one.

    The ad mentions the large numbers of people who will be learning this character as a result of riding the MTR, and advertises the concept of advertising on the MTR (I think).

  22. Chas Belov said,

    September 2, 2008 @ 1:45 am

    @myself: If I recall correctly, the last character in 點解要大家笠 actually used a heart radical above or below rather than the bamboo radical above the /ləp/ phonetic. Alas, I haven't seen the album in years, so I can't say this for certain.

    While the Unicode space for Chinese characters is restricted, there are many Han characters, such as surnames (so I've heard) that are not part of the Unicode space. Presumably a sufficiently advanced typesetting program would allow the insertion of any phonetic or radical into any position within a character. However, you would not be able to take that new character and place it into a web page as text, since Unicode does not have a corresponding way to build new characters (that I know of; although it would be a useful future feature).

    Then there's the Chinese art form involving filling whole canvases with column upon column of neatly printed invented characters. I saw this once at the Asian Art Museum in San Francisco and it was mind-blowing. If one was not aware of this art form, encountering it would certainly present an interesting problem trying to make sense of it (you can't) or determining what Chinese language it comes from (it doesn't).

  23. Clint Burgess said,

    September 8, 2008 @ 6:11 pm

    For those as curious as I was, here is a link that contains three images of pages with invented chinese characters perhaps similiar to the ones Chas Belov mentions at the following link:,%20Brooklyn,%20NY.php

    Excerpt from the description found on the link above: "A Book from the Sky took Xu Bing over four years to complete. The installation is comprised of hundreds of printed volumes, ceiling and wall scrolls containing a vocabulary of four thousand 'false' Chinese characters invented by the artist and then painstakingly hand-cut onto wooden printing blocks. Each set of books is a complete wood cut edition, printed with the same four thousand word vocabulary as used in the installation volumes."

    Very interesting article and comments, even for someone who doesn't speak or read Chinese. Thanks all of you.

  24. shivrajj said,

    October 2, 2008 @ 11:18 pm

    can i get some materials which is written in cantonese romaniesed?

  25. Ash said,

    October 15, 2008 @ 11:53 am

    You have to be careful when talking about "written Cantonese". To an English speaker it usually means writing the way a Cantonese speaker speaks. But to a Cantonese speaker it doesn't necessarily mean that. If you walk into a bookstore in Hong Kong and ask in Cantonese for a book in Cantonese, the clerk will look at you like you're insane. To them, all books written in Chinese (whether it be standard Chinese or colloquial Cantonese) are Cantonese. They look at standard Chinese as written Cantonese, not as Mandarin (even though it pretty much is Mandarin). I've personally experienced this several times and Don Snow mentions a similar story in his book Written Cantonese. The effective way to ask is: Do you have any books where when they mean 佢哋 that they actually write 佢哋 and not 他們? That type of writing is looked down upon by most educated Hong Kongers, which I find unfortunate.

  26. GT said,

    October 25, 2009 @ 4:17 pm

    For those who deny the existence of written Cantonese,
    here are the evidence of it in the press
    In magazine
    In newspaper

  27. David Chen said,

    October 12, 2011 @ 5:52 pm

    There is this annoying myth that many non-Mandarin languages are not written but only spoken. Prior to romanization, my understanding is that people wrote Chinese characters but spoke the language of their locality whether it be Shanghai, Guangzhou, Chaozhou, Xiamen, Beijing, etc.

    In Taiwan most people do read and write Mandarin…but in much Taiwanese pop music…there are still written lyrics that accompany the music. Most of it is consistent. Just that when Mandarin (Bejing Dialect) became official language…written forms of other Chinese languages took a backseat. Cantonese…was different…mainly because Hong Kong was separate from China and allowed autonomy for linguists and educators to further cultivate and develope their Cantonese language resources.

    Also some usage of the Chinese characters are unique to each Chinese language. I know this for Taiwanese.

    Taiwanese Hokkien has dictionaries and stuff…but you mainly have to go to Taiwan to buy them. Taiwanese have recently done the same thing…but sadly the KMT government has suppressed that. Sadly they never make it to the USA.

    Mary Knoll Dictionary ( has a dictionary for Taiwanese-Mandarin English. The problem is that the Chinese characters represent mandarin terms, not the actually literal Taiwanese. So they are not always verbatim with the written chinese characters. For example Dragon Boat Festival is literally 5th-month festival (go-goeh-cheh) is suppose to be literally written as "wu-yue-jie", but the chinese characters represent "duan-wu-jie". Also certain terms for Mexican, tear, eye, clothes are different from the Mandarin vocabulary but still legitimate chinese characters.

  28. Qu Yuan said,

    November 14, 2011 @ 2:57 pm

    "Dragon Boat Festival is literally 5th-month festival (go-goeh-cheh) is suppose to be literally written as "wu-yue-jie", but the chinese characters represent "duan-wu-jie".

    1. "Dragonboat Festival" is a concocted British english expression (ca. 19th century?) that suggests the colonialists who first coined it knew not of what they were describing. Corresponding literal rendering: long (dragon) zhou (boat) jie (festival).

    2. The ritual racing of dragonboats is but only one aspect of the annual "duan wu jie" observance, comprised of the characters connoting 'upright' (duan), 'sun' (wu) and 'festival' (jie)…. which is really a reference to the summer solstice (in the northern hemisphere).

    3. China's traditional soli-lunar calendrical reckoning system associates the solstice with the '5th day of the 5th lunar-reckoned month' or 'double fifth'. Double-ness is auspicious (e.g. double happiness, double tenth, ten thousand ten thousands, etc.) So fifth (wu) month (yue) festival (jie) is a euphemistic reference since duan wu is the only festival associated with this particular period of the calendar.

    4. The summer solstice (longest day of the year; most sunshine) relates to the Yang (male energy) of the Yin-Yang duality (the Tai ji symbol: at its maximum potency in the annual cycle. Similarly, winter solstice or shortest day of the year relates to Yin or female energy. Conseqnetly: dragon:phoenix, soli:sombre or sunny (presence of sun):darkness (shadow / absence of light).

    5. The best contemporary translation for 'duan wu' might possibly be "solar meridianal maximus" or "high noon". The 'wu' character is found with shang wu and xia wu with 'wu' being like the "m" in a.m. and p.m. The single character 'wu' is an archaic one that originated in the context of the traditional chinese sexigenimal (60 years) era system of heavenly stems and earthly branches (compare: 100 years century era system). It relates to the 12 chinese zodiacs, and correlates with the direction 'south', which is the direction the sun is apparent from in the northern hemisphere. Whereas in western cartography north is 'up', in traditional chinese cartography, south is the principle direction.

    6. The 'dragonboat festival' naming given by the colonial English is tantamount to naming Christmas (Christ's birthday) as "Christmas Tree Festival" or American Thanksgiving "as Turkey Festival"… where one who is ignorant of the origin and significance of the festival celebration identifies and names the festival after one of the dominant ritualistic elements witnessed during the celebrations.

    So dictionary definitions and literal translations must all be considered within the broader context of the cultural phenomenon…. everyone is right but nobody is right.

    Cantonese pronunciation of duan wu jie is duen eng jit, which is rendered as tuen ng jit where the t is pronounced as a d since there is no apostrophe between the t and the uen. Taiwanese sometimes rendered as Tuan Wu, again where t is pronounced as a d.

RSS feed for comments on this post