The concept of word in Sinitic

In the following posts, we've been tackling the thorny, multifaceted question of whether Vietnamese has words and lexemes, as opposed to having syllables and morphemes:

During the course of our discussions, the parallel question of whether Sinitic had words or not also came up.  Let me put it this way:  although there was no concept of "word" in Sinitic before the 20th century, there were Sinitic words, going all the way back to the oracle bone inscriptions (the first stage of Chinese writing) more than three thousand years ago, as documented in these posts and dozens of others:

In 2002-2003, when I was teaching at the University of Hong Kong, I had a Chinese linguistics class consisting of 72 students (the same number as Confucius' disciples!).  They were among the brightest humanities students in Hong Kong, but it was all I could do in the course of a semester to get them to comprehend the difference between a word and a character.  No matter which angle I addressed the matter from, their eyeballs would go rolling back into their head, and they looked as though they were suffering from a migraine.  Even now, a decade and a half later, when my classes are full of very smart students from across the Sinosphere, a curtain falls down over their gaze when I try to explain what the difference between a character and a word is.

Indeed, before the first half of the 20th century, there was no word for "word".  The word for "character", zì 字, was clear enough, but for lexical units larger than that, one had to resort to the expression cí 辭 ("phrase").  When the first dictionaries that were not strictly centered on characters (i.e., zìdiǎn 字典) were published, they were referred to as cídiǎn 辭典 (lit., "phrase dictionary"):

During the first half of the 20th century, modern Western linguistics began to seep into language studies in China, until finally scholars there decided that they needed to devise a word for "word".  What they did was borrow the term for a particular type of relatively vernacular poetry, cí 詞, which was popular around a thousand years ago, and assign it to fill the vocabulary gap for "word".  Nonetheless, as late as a quarter of a century or so ago, I was on a panel at a meeting of the Association for Asian Studies with one of China's most distinguished grammarians of the day, and she made the amazing statement that language specialists in China were still striving "to excavate the words" of Sinitic.

So now, at least, we have 字典 zìdiǎn ("character dictionaries) and cídiǎn 詞典 ("word dictionaries").  Progress!


  1. Lars said,

    October 3, 2018 @ 9:24 pm

    So they coined a new word for 'word' and made it homophonous with the former word for 'word' which actually meant 'phrase'. Ingenious.

  2. Bruce Rusk said,

    October 3, 2018 @ 11:48 pm

    Quite apart from what we call them, in Sinitic or otherwise, the word in Chinese, as in many languages, is the object of a great deal of debate. Are words a "natural" element of language? The answer is not self-evident. There's a lot of good discussion in, e.g., Jerome Lee Packard, ed., New Approaches to Chinese Word Formation: Morphology, Phonology and the Lexicon in Modern and Ancient Chinese. Trends in Linguistics. 105. Berlin: Mouton de Gruyter. 1998. There are also good articles on wordhood in the Encyclopedia of Chinese language and linguistics. These make it clear that the lexical, syntactic, and phonological word don't necessarily map onto one another. Are they all just ci?

  3. Simon Smith said,

    October 4, 2018 @ 8:11 am

    Interesting, but I don't understand why it's any more difficult to get the HK students to distinguish between words and characters than it is for any linguistics students to distinguish between words and (bound or free) morphemes. They must understand that 你们 (you pl.) is two morphemes and 你 (you sing.) is one morpheme, but that they are both words! The difficulty is perhaps that 字 has two meanings: (1) English word, and (2) Chinese character. Nobody talks about English having 词…

  4. ktschwarz said,

    October 4, 2018 @ 12:21 pm

    In Classical Chinese poetry, did polysyllabic words ever break across lines? If that was impossible, was it ever recognized as a rule in literary criticism?

    Words never break across lines in the Iliad, but I don't know if anyone ever stated that as a rule of heroic hexameter (although literary critics going back to antiquity have always been interested in whether words break across feet). It seems to be too obvious to mention for speakers of European languages, which all have distinct concepts of syllable and word.

  5. headspin said,

    October 4, 2018 @ 1:57 pm

    The fact that the seemingly very simple concept of a word is so different in Chinese was a difficulty for me when I was learning Mandarin Chinese in Taiwan. As an English teacher, I heard many times per class the Chinese word 单字 to refer to English words and I naturally assumed that this could be extended to Chinese words, which it cannot.

    I would try to form a sentence like 那个单字 “汉堡” 是什么意思? meaning something like “What does the word “hanbao” mean?”, but it doesn’t really work and it's not really understood by average native speakers.

    What my Taiwanese colleagues would say, or repeat back to me, would be something like 那两个字 “汉堡” 是什么意思? meaning “What do the two characters "hanbao" mean?”

    Of course, Chinese polysyllabic words do break across lines and are not really thought of as stuck together. To get an example I looked on BBC Chinese and the headline of the very first article reads like this:


    (The last two characters are one “word” 变数 biànshù meaning “variable”.)

    When this concept was extended to English words by my Chinese-speaking students, they would regularly write on the board sentences broken up mid-word like this:
    My nam
    e is Angel.

  6. Chris Button said,

    October 4, 2018 @ 2:16 pm

    "Words in Mandarin: twin kle twin kle lit tle star"

    Since the "-le" in "twinkle" and "little" reflects an earlier formative suffix, I suppose it would have been written in Chinese with its own character thus rendering the suffixal role far more apparent for future generations. Incidentally, from a (Wellsian) pronunciation perspective, I would go with "twink le" /ˈtwɪŋk.ɫ̩/and "litt le" /ˈlɪɾ.ɫ̩/ which happens to coincide with the morphology here.

  7. Yoandri Dominguez said,

    October 5, 2018 @ 5:16 am

    this is linguistics greatness & strength in that it can teach us the true thought breaking, or, where a word start & end. even in our tongue we struggle to know whether to write words as one or two or dash-tied. however, i believe this is a tool obsession, & fetishism, as if our letters & orthography was magic or deeply good. if we chose a truly smart way, we'd write phonemically alone with a smaller IPA set, & always, inbornly, sooner learned & earlier taught.

  8. AntC said,

    October 5, 2018 @ 7:03 pm

    My nam
    e is Angel.

    If you speak a language without syllable-final consonants; then clearly you pronounce "name" as two syllables \nã-mee\ (nasalised vowel in the first syllable). And English orthography conveniently has a vowel at the end of the word, to prove it's two syllables. So I'd expect

    My na
    me is Angel.


    I ho
    pe it's OK.

    Why does English orthography leave out the second vowel/syllable on "helpe", "halfu", etc when it's there on "name", "hope", "liitle"? It really is a terrible writing system.

  9. Jerry Packard said,

    October 5, 2018 @ 9:11 pm

    Thanks for the kudos, Bruce.

  10. Chris Button said,

    October 6, 2018 @ 10:30 pm

    @ AntC

    Yes "name" was originally two syllables. There's a nice wiki article on it all here:

  11. BZ said,

    October 8, 2018 @ 10:14 am

    So how exactly do you define "word"? I recall posts here mostly regarding WOTY entries, which asserted that certain phrases can be words, but I don't recall what the criteria is.

    I've always thought of words as a series of letters without spaces in between that have meaning. This has worked for me with Russian, English, German, Hebrew, and Aramaic. Of course all of those have phonetic alphabets (and spaces), unlike Sinitic languages. Was there ever a time when every character was a word?

    And then there is spoken language. Was there a concept of "word" before writing. Even today, what if you can't read? Would compound words have to be considered several words?

