Words, morphemes, collocations, characters

« previous post | next post »

We've met Julesy before:  "The conundrum of singing with tones" (5/30/25).  She has a Ph.D. in linguistics and knows how to communicate her scientific knowledge of Mandarin to intelligent laypersons.  Here she is again, this time telling us some very important things about the differences between words and characters:

During the first half of her presentation, Julesy made me feel that she was preaching the gospel according to VHM (difference between zì 字 ["character"] and cí 詞 ["word"]), spacing / parsing, etc., but in the second half she got into some statistical surveys and the notion of "collocations" that were "lexically significant", and salvaged some unique properties of sinographs while yet assimilating them into modern concepts of linguistics.

What a breath of fresh air to have someone with her expertise and exactitude explaining how Sinitic languages work.  Until the recent past, most of what was purveyed about "Chinese" was either too technical and theoretical for the non-specialist to grasp or was a mishmash of nonsense gobbledygook.

Keep 'em comin', Julesy!

 

Selected readings

 



21 Comments »

  1. Barbara Phillips Long said,

    July 3, 2025 @ 11:56 pm

    Julesy was very informative, but I got distracted a couple of times. She pronounces the “t” in “often,” which I do not, and I got the impression that “et cetera” came out with “eck” instead of “et.”

    She also avoided going down the rabbit hole about “iced cream” and “ice cream,” but I am left wondering what she thinks of “iced tea.” (Sorry, as far as I am concerned, “ice tea” is not acceptable, even though I am enthusiastic about ice cream.)

  2. Pedro said,

    July 4, 2025 @ 4:46 am

    @Barbara Phillips Long: Lots of people pronounce the T in often, or pronounce et cetera "excetera". I don't, but I long ago learnt to ignore such things when listening to an expert speaking on a subject I'm interested in – far better to pay attention to what she's saying.

    I love the example of ice cream. You can't separate the two parts and say *"Please buy whipped, ice and double cream" (or even *"Please buy whipped and ice cream", which only implicitly separates them) – you have to say "Please buy whipped cream, ice cream and double cream" (or better still "Please buy ice cream, and also whipped and double cream") – which to me is a strong indicator that it's a single word, despite being spelt with a space.

    By the same logic, a full infinitive like to buy can be shown to be two words because you can say "I need to go to the shop and buy ice cream" (where the two words of the phrase "to buy" are separated by five words).

    This is a point Julesy didn't address in her video. Is it possible to split those disyllabic compounds up and recombine them or reorder them?

  3. Victor Mair said,

    July 4, 2025 @ 4:50 am

    @Barbara Phillips Long

    Those are all good pronunciation issues that you raise, but my impression is that, in the general population, people come down on one side or another of them willy-nilly.

    I don't like the sound of "eckcetera", but some of my dearest friends (and even family members!) say it.

    I myself am usually punctilious about pronouncing as many of the letters in a word / phrase as possible / permissible — e.g., "iced tea" — but I'm not bothered if people say "ice tea", and, in running conversation, I probably say it myself fairly often ("offen", not "often", for me).

    Gerald Ford and Jimmy Carter (who was by training a nuclear engineer) both said "nucular", which really grates on my ear(s).

    C'est la vie!

  4. Victor Mair said,

    July 4, 2025 @ 5:00 am

    @Pedro

    You and I were writing our comments at the same time and in the same mood / mode / mentality..

    I like an issue that came up in your last sentence: "split XXX up" vs. "split up XXX".

  5. wgj said,

    July 4, 2025 @ 9:11 am

    But anyone who speaks German knows that words are regularly split up:
    abwarten – ich warte ab – ich habe abgewartet

    So is "ich warte ab" three words or two?

  6. Ethan said,

    July 4, 2025 @ 1:12 pm

    Why not classify "ice cream" as one word that happens to be spelled with a space in it? That is admittedly more common for proper nouns that originated in another language (e.g. "Los Angeles", "Des Plaines", "De Bruin" or even "De la Vega"), but seems logical also for cases such as "tennis ball", which for some reason usually has a space while "basketball" and "baseball" do not. "volley ball" mostly lost its space around 1936 according to Google Ngrams, having maybe donated it to "pingpong ball" about 10 years earlier.

  7. Chas Belov said,

    July 4, 2025 @ 4:20 pm

    Sorry if it grates on some people, but I'm noticing that I'm now more often pronouncing the "t" in "often" than I used to. I also reside firmly in the "eksetera" camp.

    @Victor Mair:

    Gerald Ford and Jimmy Carter (who was by training a nuclear engineer) both said "nucular", which really grates on my ear(s).

    I think I associate "nucular," rightly or wrongly, with Bush #1. I do recall that some left-leaning folks would laugh about the pronunciation and take it as a sign of reduced intelligence. If Carter did it as well, I wonder whether they suddenly shut up. Alas, I don't recall how they reacted. I just take it as a variation; nevertheless, I do find it distracting.

    Wikipedia has interesting things to say about "nucular", including some thoughts from former Language Log contributors Arnold Zwicky and Geoffrey Nunberg.

  8. Yves Rehbein said,

    July 4, 2025 @ 4:28 pm

    @wgj, that's tangential to Pedro's second example. Thanks for the prompt, I was going to say: I am not sure about need to, but wanna, gonna, tryna, fina show that that to is a clitic. Hence it should make sense to analyse the infinitive with to as a ditropic clitic. Cf. wolfe za-rinnit* "she runs to the wolf", Frotscher and Scheungraber, On the historical development of the Germanic preverbs *ga-, *bi-, Goth dis-, OHG zar- and Goth du-, OHG za- (2019:6).

    @Pedro, I think the point is that the words are not seperable. That means there are no counter examples, so there is nothing to say but a blanket statement? There might be archaism and reanalaysis, that would be a different matter.

    I recall that Chu-Ren Huang et al. (2022, The Cambridge Handbook of Chinese Linguistics) contains an article on wordhood, and Packard et al. (1998, New Approaches to Chinese Word Formation), though I have never read them seriously.

  9. JPL said,

    July 4, 2025 @ 6:08 pm

    Very interesting!

    I never comment on Chinese stuff because of my complete ignorance, but this presentation was pretty clear. So, what would you call the process of "character collocation" if you were talking about the spoken language? It looks like it might be something that includes what in English we might call compounding or derivation, depending on whether the combined items are otherwise free vs. bound in their possibilities of occurrence in (spoken) sentences. Expressions like "character" and "spacing" refer to the written mode; How would you refer to what you think might be minimal free forms, morphs (bound or free), free forms consisting of more than one morph, etc. in the spoken mode? In the spoken mode of course instead of spacing, the typical relation between significant forms at their boundaries would be, I would guess, the "scrunching" together Mark always describes. (I can't remember the proper phonological term for this.) A lexicon consists of the lexemes for the language; words belong to the sentences. Some languages may have a lot of lexemes that are bound forms, and of course in principle a lexeme may be made up of more than one morpheme. Does the Chinese writing system privilege monomorphemic word forms, while the lexicon recognizes lexemes with more than one morpheme?

  10. Jonathan Smith said,

    July 4, 2025 @ 6:37 pm

    The items at ~4:38 and raised by Pedro (called "VO compounds" or something) are intransitive verbs with general meanings, e.g. famously chī(-)fàn means 'eat' — as in 'let's go eat' or 'I love to eat' — and NOT 'eat food', whereas chī alone is transitive 'eat' including (as is so often the case) when "O" is elided… so e.g. [nǐ] chī means "[You] eat/have [some (of this food)]", NOT "you eat."

    These "V" and "O" components are syntactically separable, with the "lexeme" instantiated just the same… and in many cases the components aren't independently usable/meaningful, so e.g. with shuì(-)jiào 'sleep (v.)', someone might say (real example googlefooed) Jīnwǎn zhège jiào nǐ kěndìng shì shuì bù chéng le 今晚这个觉你肯定是睡不成了 lit. "tonight this JIAO ("O") you guaranteed will SHUI ("V") not successfully anymore" where independently noncromulent JIAO demands that SHUI come complete it sooner or later (google messes up pinyin transcription due to this wrinkle as it happens.)

    So re the video, these items are NOT adequately treated as "phrases." Of course given their nature, you'll get mixed results when asking native speakers to do word division on written text. And with separated examples like the one just above, you couldn't clearly indicate "one word" even if it occurred to you and you wanted to.

  11. Viseguy said,

    July 4, 2025 @ 8:30 pm

    Collocations vs. words. I like it! Sort of like tense in English vs. aspect in Russian. The great thing about learning another language is that it can force you to think outside your (hardened) categories. Incidentally, this is true of computer programming languages as well. Am currently trying to grok Zig based on my (rather rudimentary) knowledge of C. Learning_a_new_language:categories :: statin:arteries.

  12. Viseguy said,

    July 4, 2025 @ 8:50 pm

    PS: "Eck-cetera" jumped out at me, too. For people, like me, who were taught grammar by nuns in the 1950s, it's catnip — like "nucular" and "Feb-YOU-ary". I'm sorry, but it's wrong, just wrong! ;-)

  13. Jerry Packard said,

    July 5, 2025 @ 6:03 am

    For those interested in the above questions about Chinese morphology I would recommend my book ‘The Morphology of Chinese’, which deals with all the above issues in gory detail.

  14. wgj said,

    July 5, 2025 @ 7:41 am

    I'm also bothered by the uncritical starting assumption of this video that the term "word" must mean the exact same thing for different languages. That's a typical attempt to reshape reality to fit one's model. I'm reminded of the recent discovery of a virus-like cell that challenges the definition of life:
    https://www.yahoo.com/news/strange-cellular-entity-challenges-very-010750382.html

    If foodies everywhere can't agree on what is and isn't a bread (not to mention a sandwich), I see no hope – nor any necessity for that matter – for a uniform (and universally accepted) definition of "word".

  15. Yves Rehbein said,

    July 5, 2025 @ 6:42 pm

    The recursiveness of morpheme, word, phrase, sentence and text gives ample opportunity for confusion. "A" can be a microtext, a word and even a phoneme. But to define phoneme you need those higher level concepts.

    It is notable that when one of the above types is defined, it is defined in terms of other types. So for example, sentences might be (partly) defined in terms of words, and words in terms of phonemes.

    https://plato.stanford.edu/entries/types-tokens/

  16. Julian said,

    July 6, 2025 @ 1:00 am

    Pedro, wgj, Ethan
    If the question is "Is X a Y?" (that is, an instance of the species Y), I think sometimes the most fruitful answer to start with is "It depends what you mean by Y."
    If by "word" you mean "a string of language of the sort that in written English is customarily set off by spaces", then "cheeseburger" is one word, "ice cream" is two.
    But that's not very helpful because writing conventions vary over time and between languages in ways that don't correspond reliably to the underlying grammar or semantics (as in that example).
    If by "word" (lexeme) you mean something like "the smallest unit of language that can be combined with other words according to the rules of grammar to make a potentially infinite number of different sentences", then i would suggest that "cheese", "burger", "ice", "cream" "cheeseburger" and "Ice cream/icecream" are all different words. (A compound may look like it's not a "smallest unit", as you can decompose it into elements like "cheese", "burger", "ice" and "cream". But the point is that, once we certify it as a separate word, its grammar is different – as Pedro says, you can't say "Please buy whipped, ice and double cream.")
    By that definition, strings like "Donaudampfshifffahrtgesellschaft" and "Danube steamship travel company" are grammatically identical: they are noun phrases consisting of the head at the end modified by a number of other nouns. The different writing conventions are incidental.
    I guess that means I would say that "ab", "warten" and "abwarten" are three different words, on the basis that the grammar of separable verbs is different from the grammar of independent prepositions (I assume; not sure of the details). Not sure what that means for how you would describe "ich warte ab".

  17. edith said,

    July 6, 2025 @ 7:46 am

    Names for things in English often start as a set of nouns that then get linked by a hyphen, and finally glued together:

    data base — > data-base –> database

    Whle German starts where English ends.

    Maybe in English it just takes time for a phrase to give birth to a word.

  18. Kimball Kramer said,

    July 6, 2025 @ 1:06 pm

    I am a retired physicist and from time to time I hear someone say "nu-cu-lar. I generally ask them if they mind if I correct their pronunciation and, so far, everyone has said "please do". The conversation proceeds as follows: Say "old cloudy"—(response). Say "new clear"—(response). Say "old cloudy power"…Say "new clear power"…Say "old cloudy energy"…Say "new clear energy". Then I explain that they should half-swallow the word "clear" instead of giving it equal emphasis to "new". And by then they realize that they can say the word correctly.

  19. Jonathan Smith said,

    July 6, 2025 @ 11:43 pm

    as a cue, why not "nucly > even nuclier?"
    So this is not mechanically challenging of course — but I am susceptible to this "popular pronunciation" myself and assume it is down to English's many many -cular words and zero -clear/-clier words (suffixed -er as above not counting of course as phonotactic principles/intuitions need reference to morpheme boundaries.)

  20. Terry K. said,

    July 7, 2025 @ 4:43 pm

    I think tennis ball is different than basketball, volleyball, and baseball. The latter 3 are the name of the sport as well as the name of the ball used. We can say "a ball used to play tennis". Thus separating the words. We can't say "a ball used to play basket". It's "a ball used to play basketball".

  21. Chas Belov said,

    July 7, 2025 @ 10:12 pm

    "Nuclear" pronouncer here, but I'm puzzled as to why some are taking a prescriptive rather than descriptive approach to "nucular." I won't deny it grates on me, but that's my problem, not the speaker's.

RSS feed for comments on this post · TrackBack URI

Leave a Comment