Written Cantonese on a "Democracy Wall" at a University in Hong Kong

A Language Log reader in Hong Kong sent in the following photograph:


Ji6 sap6 seoi3 gei3 nei5 zou6 gan2 me1? Toi4 waan1 daai6 hok6 saang1 zim3 ling5 gan2 lap6 faat3 jyun2.

Modern Standard Mandarin (MSM) translation:

Èrshí suì de nǐ zài zuò shénme? Táiwān dàxuéshēng zài zhànlǐng lìfǎyuàn.


"What is 20-year-old you doing? Taiwanese university students are occupying the Legislative Yuan."

This ties in perfectly with the recent post entitled "Once more on the present continuative ending -ing in Chinese" in two ways:

1. thematically both this poster and the recent Language Log post have to do with the occupation of the Legislature by students in Taiwan

2. Cantonese gan2 緊 functions somewhat similarly to the present progressive English "-ing".

Note the quite different structure of Mandarin:


Vb gan2 緊 ("be Vb-ing")


zài 在 Vb ("be Vb-ing")


On the left side:  man4 zyu2 coeng4 民主牆 ("democracy wall")

20 as ji6 sap6 is perfectly correct and good vernacular Cantonese, though people are perhaps more likely to just say jaa6 廿.

既 is supposed to be 嘅 ge3; in this particular case it may be considered a slight miswriting.

[Thanks to Stephan Stiller and Bob Bauer]



  1. Stephan Stiller said,

    March 22, 2014 @ 12:09 am

    By the way, the sentence-final particle 咩/me1 is used to indicate surprise.

  2. Miguel said,

    March 22, 2014 @ 12:45 am

    20 as ji6 sap6 is perfectly correct and good vernacular Cantonese, though people are perhaps more likely to just say jaa6 廿.

    Yes however since it is only 20 it would be correct as ji6 sap6. One would use jaa6/saa1 aa6/etc with 1-9.

    At least according to 我老婆.

  3. Miguel said,

    March 22, 2014 @ 12:53 am

    Oh I'm wrong you are absolutely correct..廿歲 is perfectly valid. ^^
    30 up would be incorrect.

  4. Stephan Stiller said,

    March 22, 2014 @ 1:46 am

    In case some readers don't know about this, 民主牆 (Democracy Wall, in Mandarin: Mínzhǔ Qiáng) is in reference to the historical Democracy Wall in Beijing in 1978-1979.

  5. Victor Mair said,

    March 22, 2014 @ 7:21 am


  6. Victor Mair said,

    March 22, 2014 @ 1:31 pm

    From Bob Bauer:

    Your translation of Cantonese 咩 me1 as 甚麼 'what' is correct.

    In this text 咩 me1 represents the shortened/contracted form of 乜嘢 mat1 je5 'what'.

    In certain contexts — but not this one — 咩 me1 can express surprise as mentioned in one of the posts.

    As for 嘅 ge3 and 既 gei3: Just as was done in the Democracy wall text, some writers will write the Cantonese genitive particle ge3 as 既 gei3 since this is a readily-available standard Chinese character and is similar in pronunciation to the target morpheme. Other writers by convention use 嘅 ge3 which is a Cantonese, i.e. nonstandard, character (which is listed on page 284 of the Jyut Ping Handbook (2002) and is pronounced ge3 and ge2).

  7. JQ said,

    March 22, 2014 @ 5:59 pm

    The usual reason that 嘅 is written as 既 is that the typist does not know how, or cannot be bothered, to type 嘅 using their preferred input method editor.

  8. JQ said,

    March 22, 2014 @ 6:03 pm

    @Stephan Stiller

    咩 is not used to indicate surprise here.

    At a stretch, standalone "做緊咩?", in the correct context, could mean "Oh, it's being done?"

  9. Stephan Stiller said,

    March 22, 2014 @ 8:02 pm

    Oops, yes, you and Robert Bauer are correct about 咩/me1 being the contraction of 乜嘢. As for 嘅: Sometimes it's difficult to say for a descriptive linguist what's correct, but 嘅 is sufficiently standard that I think people write 既 just when they can't type 嘅. I could be off, but that's my impression. Robert Bauer effectively says that the pronunciation of 既 is gei3, not ge3, and that might be the decisive factor. An example for where I find it harder to come to a judgment is 嚿/舊 for gau6 (measure word "lump of"): some people will tell you that it "should" be 嚿 (the Mandarin character 舊 with a mouth radical 口 to clarify that 舊 is used phonetically), but de facto 舊 is common enough. Some people even remember 嚿 only when I tell them about the existence of 嚿. Consider that in this case 舊 already has the intended pronunciation, gau6.

  10. Victor Mair said,

    March 23, 2014 @ 7:19 am

    From James Dew:

    Nice post:
    Ji6 sap6 seoi3 gei3 nei5 zou6 gan2 me1? Toi4 waan1 daai6 hok6 saang1 zim3 ling5 gan2 lap6 faat3 jyun2.
    But may I raise a couple of small questions?

    1) Shouldn't 既 be transcribed as ge3 here? Granted the dictionary reading is gei3, but don't we like to treat speech as primary? The character is borrowed to write ge3, so let's write ge3.

    2) Why is the transcription done syllable by syllable? Cantonese, like Mandarin or other Chinese languages, has words. Shouldn't the transcription be parsed as words rather than as syllables?

  11. Victor Mair said,

    March 23, 2014 @ 7:37 am

    @ James Dew via Victor Mair

    Thanks for your comment. Point 1) has been taken up by several of the previous commenters.

    As for point 2), I quite agree with you and wish that people would link up the syllables of Cantonese into words as we do for Mandarin. The same holds true for Taiwanese, Shanghainese, and other Sinitic languages, for that matter.

    I have often thought about this problem and have always felt awkward and uneasy about NOT linking up the syllables of these languages into words. I suppose the only excuses I have for not doing so are:

    1. nobody else seems to do it

    2. the tonal numbers at the ends of the syllables seem to get in the way of linking up the syllables, unless perhaps one uses hyphens to link them, but that is a solution which — in these post-Wade Giles days — few would find attractive

    Incidentally, when I was editing the Columbia Anthology of Traditional Chinese Literature and the Columbia History of Chinese Literature, I had originally proposed that hyphens be omitted and syllables linked up directly (e.g., tahsüeh instead of ta-hsüeh and kungssu instead of kung-ssu), but the outside reviewers for Columbia University Press were uniformly opposed, so I had to go back into those enormous manuscripts and put back all of the thousands of hyphens I had earlier omitted!

    Anyway, let's try to start a movement to link up the syllables of Cantonese and other Sinitic languages, hence

    Ji6sap6 seoi3 gei3 nei5 zou6 gan2 me1? Toi4waan1 daai6hok6saang1 zim3ling5 gan2 lap6faat3jyun2.

    Or does somebody have a different / better proposal for how to indicate word boundaries in these languages?

    By the way, the Canadian researcher, Clément Arsenault, of the University of Montreal refers to the linking up of syllables as "aggregation". He wrote his dissertation on this subject and has also published several articles about it, paying particular attention to the effect it has for lookup efficiency in library catalogs, etc.


  12. Stephan Stiller said,

    March 23, 2014 @ 7:54 am

    @ James Dew (via Victor Mair)
    Short answers: Yes and yes.
    Longer answers:
    1. We also need proper notation and standard terminology. As for notation, one can write "ge3" and "既[嘅]". As for terminology, {"phonetic approximation", "input method typo", "graphemic substitution", "graphemic approximation", "graphemic simplification", "graphemic omission"} are some terms to choose from.
    2. A technical answer is in this comment to another LL post. Visually, the use of numbers works against one's instinct to do word segmentation, though word segmentation is indeed preferable.

  13. Stephan Stiller said,

    March 23, 2014 @ 8:19 am

    To address Victor Mair's question about word segmentation or "word aggregation": Typographically this would be easiest when we use super- or subscript tone numbers together with thin spaces between syllables that are shorter than a hyphen in the font one is using. That way the visual look is closer to what one is used to (though I don't see anything inherently visually bad about tone numbers as opposed to tone graphemes/diacritics), making it look less obtrusive for regular orthographic purposes, and one can even apply Hanyu Pinyin's rules for word segmentation (aggregation) and hyphenation. If one is so-inclined, one can even use English-style en-dashes if a use case pops up. In an ideal world there'd be a notational variant of Jyutping officially designated for ordinary orthographic purposes, possibly employing line-like symbols for tones after each syllable (I'd disprefer diacritics above vowels for certain reasons).

    (Note that we don't have to use thin spaces between syllables in order to meaningfully apply Hanyu Pinyin's orthographic rules re word segmentation/aggregation/hyphenation.)

    I'd want to try here, but I doubt WordPress deals with the various Unicode spaces correctly. And I can't use markup-superscript or -subscript here (HTML tags <sup> and <sub>).

  14. J. M. Unger said,

    March 23, 2014 @ 9:09 am

    Since all tone categories can be enumerated with single digits, and since there are many cases in which people feel it's all right not to indicate tones, why not adopt the practice of writing the tones of a word as a string of digits immediately after the whole word? E.g. = /shi2 zhi4/.

  15. Stephan Stiller said,

    March 23, 2014 @ 9:20 am

    @ J. M. Unger
    I think you mean 實質 as "satzat61". That's an ingenious idea.

  16. Victor Mair said,

    March 23, 2014 @ 11:48 am

    From Bob Ramsey:

    Gari Ledyard scrupulously avoids using hyphens in Chinese and Korean names. He says that the insertion of hyphens between syllables is a holdover of earlier Western practices used for "primitive" languages. Remember how American Indian languages used to be transcribed? The hyphens make them look exotic and odd.

  17. Victor Mair said,

    March 23, 2014 @ 1:29 pm

    From Bob Bauer:

    Re: 20歲既你做緊咩?台灣大學生佔領緊立法院
    Ji6sap6 seoi3 gei3 nei5 zou6 gan2 me1? Toi4waan1 daai6hok6saang1 zim3ling5 gan2 lap6faat3jyun2

    I think in this sentence it's more appropriate to transcribe the pronunciation of 既 as ge3 since it is being used to write the genitive morpheme; in other words, the genitive morpheme's pronunciation must override the character's usual reading pronunciation in this context.

    As for syllable linkage in Chinese, it's something I've thought about quite a lot. It raises lots of questions.

    First, what is the justification for separating the classifier from the number in ji6sap6 seoi3? The number almost always must be followed by a classifier, so why not write all three together as one word?

    Since all the syllables of 20歲既 function together as one modifier, why not write them all together as one unit, i.e., ji6sap6seoi3ge3?

    Since gan2 is a bound morpheme and can never occur independently but must be attached to a verb, it seems illogical not to write zou6gan2 and zim3ling5gan2.

    The criteria for linkage, as well as for segmentation, must at all times be justifiable, logical, coherent, consistent, and systematic.

    In my view, many fuzzy issues with linkage make it both a messy problem and a slipperly slope that are best avoided.

  18. Chris Atwood said,

    March 23, 2014 @ 8:51 pm

    The word segmentation issue comes up in Tibetan too. As Chris Beckwith points out, Tibetan is written in a syllabic script, which has led people to transcribe it the same way. People used to use hyphens, but the recent trend is to eliminate them and just have the syllables separate, i.e. not rdo-rje, khams-pa, or rnying-ma but rdo rje, khams pa. There is a problem of ambiguity. Beckwith proposed some rules for how to aggregate syllables without causing ambiguity (eliminate hyphens for fixed suffixes or when the syllable is open, so rdorje and khamspa, but rnying-ma. Thus also relates to the question of transcription: if you use digraphs as the Wylie system does, the last example written without hyphens, i.e. rnyingma, could be read as rnyin-gma, as well as rnying-ma, although I don't think any one would actually be confused, because gma is probably not an actual word. I've noticed that Bettina Zeisler, a philologist of Old Tibetan, aggregates all the syllables into words without any hyphens and the results seem (to my non-Tibetanist eye) to be complete not a problem, and rather more readable.

  19. JS said,

    March 23, 2014 @ 10:42 pm

    @ Bob Bauer

    First, what is the justification for separating the classifier from the number in ji6sap6 seoi3? The number almost always must be followed by a classifier, so why not write all three together as one word?

    Cornelius Kubler's pinyin Mandarin textbooks test a number of orthographic innovations including the one you raise here, but I have difficulties with it… how is the fact that a numeral is very often followed by a classifier an argument for aggregation? Such decision-making should almost always be driven not by the frequency of a given concatenation but by (what we can discern of) its wordhood from a cognitive perspective: there is a reason we would never expect to find Mand. number-classifier combinations such as yizhang or sanpi listed in a dictionary.

    Probably this begins from yige, tempting due to frequency, but the idea loses its appeal well before one gets to sanfen zhong 'three minutes' or sizhan lu 'four [bus] stops'. Numerals and classifiers are just different, even if often used together, and writing them as "one word" is confusing (esp. to students!), not elucidating.

    In my view, the same applies mutatis mutandis to Mand. de/Cant. ge, Mand. le, etc.: cognitive atom / free morpheme / obvious dictionary entry > write separately… aspect markers and complements more bound in their relationships to verbs, like Cant. gan2, are of course a gray area. Where there is an important point: segmentation decisions will never be entirely "justifiable, logical, etc.," but trying to make them seems preferable to throwing up one's hands, with arbitrary finer points left to be ironed out by a community of users… lack of this last being the main reason such issues won't soon be resolved in the case of Sinitic.

  20. Stephan Stiller said,

    March 23, 2014 @ 10:43 pm

    @ Chris Atwood
    Hanyu Pinyin employs apostrophes to deal with the Mandarin analogue of cases like the Tibetan "rnyingma" (ie ambiguous syllable segmentation).

  21. Simon P said,

    March 24, 2014 @ 3:36 am

    Syllable aggregation is not defined in Jyutping because it's not designed to be written in sentences, but only as a phonetic help (for example in dictionaries). If one wanted to aggregate syllables, surely using Yale would be a more logical starting point? Yale doesn't differentiate enough, though, but stealing the tone system from it and integrating it with Jyutping would be a pretty good system, in my opinion:

    Yihsahp seoi ge néih zouhgán mè? Tòih wàan daaih'hohksàang zimlíhnggán lahpfaatjyún.

    (Also, I think 咩 when used as a contraction of 乜嘢 ought to be transcribed "me1 e3". It's disyllabic.)

  22. Stephan Stiller said,

    March 24, 2014 @ 4:07 am

    @ Simon P

    (You mean "Ji…", not "Yi…".) I most strongly dislike Yale's use of the letter h for low tones; and for HK-Cantonese there is no reason not to use ˉ for the first tone by default. And a residual high-falling tone remains in the language as well, so reserving ˋ for those few cases will be useful. If one indicates tone via vowel diacritics, I'd want a system of 6 clearly distinct diacritics. But I would advise working with LSHK.

    Monosyllabic 咩/me1 can definitely be a contraction of 乜嘢. What you have in mind one can write as 咩嘢.

  23. Stephan Stiller said,

    March 24, 2014 @ 4:22 am

    I am in agreement with you. In all languages there is plenty of ambiguity in how a word can be defined (I am more thinking of edge cases than of principled differences); we just might not notice because we are used to any given language's orthographic conventions. A grey area is unavoidable, and I can think of no obvious reason why any one language would be richer in grey-area cases than another. Maybe the size of the grey area correlates in some way with the amount of morphology in that language (so that things will need to be differently calibrated and fine-tuned to get practical orthographies), but guidelines can definitely be formulated. Let's also observe that some language communities bother spelling out such rules more than others.

  24. Stephan Stiller said,

    March 24, 2014 @ 4:28 am

    @ J. M. Unger (and others)
    I know this is not obvious, but in comments
    • < needs to be written as "&lt;" and
    • > needs to be written as "&gt;"
    (without the double quotes).

  25. Simon P said,

    March 24, 2014 @ 6:47 am

    @Stephan Stiller, yes, I meant "jih", of course. I think there's something to be said for not using too many diacritics. Having the middle tone unmarked is reasonable, and using 'h' for low tones works pretty well, in my opinion. The result is a lot cleaner, and having six diacritical marks (as well as using ˉfor the high tone) means it's more difficult to write on a common keyboard.

    As for the number of syllables in 咩, I will still insist that when used as a contraction for 乜嘢, it's disyllabic (me1 e3). This is different from 咩嘢, which would be "me1 je5" or "me1 e3 je5" and is also used sometimes. It is also different from the 咩 in "係咩?", which is monosyllabic (me1). Would you not agree at least that these usages are different in length?

  26. Stephan Stiller said,

    March 24, 2014 @ 9:42 am

    @ Simon P

    First of all, you are right in that the contraction of 乜嘢 (mat1 je5) can be disyllabic (with various surface realizations like what you write), but it doesn't need to be: 咩-from-乜嘢 can be monosyllabic me1.

    If it is longer-than-monosyllabic (or has to be) in the above sentence ("20歲嘅你做緊咩?"), that might have to do with its sentence-final position, in which it is desirable to avoid a clash with the homophonous sentence-final particle me1 – to help the listener parse the sentence correctly. In general, if people want to convey a disyllabic pronunciation intentionally, they'll try to use two characters. So whether writing longer-than-monosyllabic ≈me1(-e3) as 咩 is "ideal" in such a case is an interesting question. Note that I'm talking about an idealized orthography here; most likely the person in question just picked 咩 and didn't think much; after all it's comprehensible.

    One can even have 咩-as-mje1 (with some variation in surface realization) [< me1je5 咩嘢 < mat1je5 je5 乜嘢嘢], which is one of the cases you describe, but that's yet another thing, like you say.

  27. Barbara Morrell said,

    March 24, 2014 @ 11:50 am

    In reference to: This ties in perfectly with the recent post entitled "Once more on the present continuative ending -ing in Chinese" in two ways:

    Entitled is incorrect. TITLED is correct.

    Unless the letters are "entitled" to an ice cream cone. :)

  28. Stephan Stiller said,

    March 24, 2014 @ 12:35 pm

    @ Simon P

    This day and age, Unicode offers enough diacritics to choose from, and in a society that is already forced to use special input methods, input ought to be no problem: it's a technical problem with a known and easy technical solution. (By the way, one could use a ring below (say) for low tones, but I'd presently advise against diacritic clusters for certain technical reasons.)

    There is a good argument against representing one tone with an unmarked vowel, and it is that such a way of doing things makes the representation of the tone-bearing syllable identical to that of the corresponding toneless syllable. This is a concern more for linguists and lexicographers, but it quickly gets annoying when you have to frequently clarify for eg "mian" in Pinyin whether you're talking about the syllable "mian" with a neutral tone or about the abstract syllable "mian" before any tonal assignment.

  29. Stephan Stiller said,

    March 24, 2014 @ 12:42 pm

    @ Barbara Morrell
    You're being prescriptivist. Using "entitled" in the meaning of "titled" is frequent enough that it's just fine.

  30. Simon P said,

    March 25, 2014 @ 12:26 am

    @Stephan Stiller: Good points on orthography, and I want to post my agreement before this thread disappears in the mists of time. I'm still not sure if I agree that 咩 meaning "what" can really be monosyllabic, though I suspect a lot of speakers think it is (sort of like people think they pronounce "rider" and "writer" differently when they actually don't). That would be a great topic for an LL blog post!

