Word, syllable, morpheme, phoneme

« previous post | next post »

What is the basic unit of discursive, communicative language — word, syllable, morpheme, or phoneme?

This topic came up in the comments to the following posts:

"The concept of word in Sinitic" (10/3/18)

"Words in Vietnamese" (10/2/18)

"Diacriticless Vietnamese on a sign in San Francisco" (9/30/18)

"Words in Mandarin: twin kle twin kle lit tle star" (8/14/12)

I will state my own preference for word as the basic unit of speech and, following that, of written language as well.  How do I justify that preference?

When I'm writing or speaking, I will say to myself or others, "I'm thinking of a certain word", not "I'm thinking of a certain syllable / morpheme / phoneme".

General purpose dictionaries are arranged by words, not syllables, morphemes, or phonemes.

Ditto for encyclopedias.

In doing discourse and frequency analysis, word is the primary unit under consideration.

In using software to input and sort the contents of a text, word is the main token relied upon.  Even for writing in Sinographic languages, the most sophisticated systems take advantage of the notion of word for enhanced efficiency.

Handbooks of concepts and ideas are organized by words, not syllables, morphemes, or phonemes.

Even with terms like "twinkle" and "sprinkle", where there may be disagreement about how to break the syllables, people know that these two terms — based on their etymological roots and grammatical formation — are meaningful words.

When doing grammatical analysis of texts to extract precise meaning, it is essential to know where words begin and where they end, regardless of whether one marks them with spaces or not.  That's what parsing is all about.  If one is unable to parse an utterance, one will not know how to understand it.

I do not believe that "word" is privileged as an artifact of alphabetic writing.   It is the fundamental unit of language that we use for thinking and speaking.



37 Comments

  1. Tim Leonard said,

    October 6, 2018 @ 5:34 pm

    Isn't what constitutes a word determined by the grammar? That is, a word in a grammatical sentence can be replaced by a different word of the same grammatical category and the result will usually be grammatical, though not necessarily meaningful. The same cannot be said of syllables, morphemes, or phonemes.

  2. V said,

    October 6, 2018 @ 9:43 pm

    I might not have made myself sufficiently clear in my comments about my Viatnamese-Bulgarian acquaintances: I am sceptical to the idea of "a word", but reading your posts on Language Log for several years, I am almost completely convinced by your arguments. It's just that I want maintain a healthy scepticism. When one of them gave an argument supporting this practice of putting spaces between morphemes it amounted to prescriptivism, in my opinion.

  3. V said,

    October 6, 2018 @ 9:57 pm

    "I do not believe that "word" is privileged as an artifact of alphabetic writing. It is the fundamental unit of language that we use for thinking and speaking."
    I don't agree with that, though; I don't think in language and I find it strange some people claim to do so.

  4. Antonio L. Banderas said,

    October 6, 2018 @ 10:09 pm

    After reading the DeFrancis' classic "Visible Speech", and the figures about his Chinese "phoneticity" in the second chapter, I unsuccessfully searched for an academic project dealing with a synchronic description, rather than etymological, of characters' "phoneticity"
    (I am not referring to mnemonic aids).

    I do not consider utopian, at least for the purported 80% of phonosemantic compounds out of the approximately 3500 characters necessary for full literacy, to carry out an investigation about synchronic meaningful patterns to arrange useful groups using "objective criteria".

    Among such criteria could be radicals (and therir context, such as position), but also their glyphs and stroke orders, as well as various linguistic information of the words that certain morphemes can create.

    Furthermore, an index of ambiguity for pinyin homophones, especially monosyllabic particles, would hekp solve the main disadvantage of the pinyin romanization.
    Most importantly, therefore, a dictionary of the Chinese language must deal with what Prof. Duanmu's lexical 'elasticity/flexibility',

    http://www-personal.umich.edu/~duanmu/2014Elastic.pdf

    as well as with the relationship between such "elasticity" and the surviving morphemes in truncated abbreviations and similar phenomena, such as what Prof. Ceccagno coined "metacompounds" – e.g., the apparent surviving morphemes in 卫视 'satellite T.V.' are at least the bound short version of the disyllabic elastic words 卫星 'satellite' and 电视 'television'.

    Such elasticity, together with the logico-semantic ordering proposed in the 'CJKI Chinese Learner's Dictionary', would imply a historic pedagogical improvement in lexicophical resources.

    Incidentally, elasticity from the Xiandai Hanyu Cidian 2005 has been tabulated in the following open access thesis
    deepblue.lib.umich.edu/bitstream/2027.42/116629/1/yandong_1.pdf

  5. Bloix said,

    October 6, 2018 @ 10:16 pm

    In English, it's pretty obviously the word. In other languages – Icelandic, Hebrew, Inuktitut – it's not so obvious.

  6. T said,

    October 6, 2018 @ 10:21 pm

    Well, that reads as if you were able to provide a clear-cut (intensional) definition of "word". In all those years I have failed to find a valid set of criteria (both necessary and sufficient). So if words are the basic unit of language, what are they? Or is it just the other way round: A word is whatever basic unit of (written, spoken, however specified) language there is in a given natural human language? Well, apparently that would involve some circular argument then, so this will hardly be your point. But perhaps it might make sense to establish such a relative terminology in typology at least, adding yet another layer of abstraction. Then we might come up with a primary concept of native learners in a given language, and instances of such primary concepts might differ indeed between languages typically.
    Grapheme-based grammars (as are still common) seem a bit restricted for most purposes.

  7. Victor Mair said,

    October 6, 2018 @ 10:38 pm

    @V

    I do not "claim" to think in words; I do think in words — if I'm considering precise issues, topics, persons, places, notions, and so forth. If I'm just having emotions or impulses or feelings, then I'm not really thinking.

  8. Andreas Johansson said,

    October 7, 2018 @ 1:34 am

    @V:

    @Victor Mair:

    I'm about equally baffled by people who claim not to think in words, and those who claim more-or-less always to do so.

    I think in words in many cases – e.g. just now, when composing this – but equally I think in nonverbal modes in other contexts, such as when visualizing a map.

    So I wonder if there really are fairly fundamental differences in how people think, or if we're somehow talking past one another.

  9. maidhc said,

    October 7, 2018 @ 1:39 am

    In English there are entities smaller than a word that nevertheless carry meaning. These are typically prefixes and suffixes. There's a difference between "clean" and "unclean". We wouldn't normally use "un-" standing by itself. But surely it's a language unit? Whereas "u-" (I mean omitting the "n") is not.

  10. AntC said,

    October 7, 2018 @ 2:35 am

    In English there are entities smaller than a word that nevertheless carry meaning. These are typically prefixes and suffixes.

    More generally, in inflected languages (that is more inflected than English), does each form count as a distinct word? Even in English, do am/is/are/be/being count as distinct? How about jump/jumps/jumped/jumping?

    Then with agglutinative languages is how we get to the (insert large number) words for snow.

  11. rosie said,

    October 7, 2018 @ 4:24 am

    What does "basic unit" mean, and, whatever it's supposed to mean, it's not clear to me that we must accept without question the notion that "discursive, communicative language" consists of basic units.

    If "basic unit" means "unit that doesn't contain any smaller units", then that would exclude any word or multi-word unit that contained any bound morphemes.

    If "basic unit" means "unit that doesn't contain any smaller units and that can stand alone" then would that include e.g. the "book" and "shop" elements of the word "bookshop" because they /can/ stand alone, even though they don't do so in this case? So "straw" in "strawberry" counts but "cran" in "cranberry" doesn't?

    If "basic unit" means "word" (at least for English), then "bookshop" counts but "clothes shop" doesn't?

    We don't /always/ think in words — it's possible to think of a notion but be unable to recall the word for it.

    [(myl) These are good points. As I wrote to Victor, there's a deeply false presupposition in the question "What is the basic unit of ,.. language?", namely that there is a unique "basic unit" at all.

    This is like asking "What is the basic unit of matter — quarks, photons, neutrons, atoms, molecules, crystals, liquids, gases, suspensions, …?" It's a nonsensical question. There's a complex set of "units" at different levels of description, along with principles about how they combine and other aspects of the relationships among them.

    There are sensible questions, like how should a writing system or a dictionary be organized. But a decontextualized argument about "What is the basic unit?" Please, no. ]

  12. Victor Mair said,

    October 7, 2018 @ 6:00 am

    "it's possible to think of a notion but be unable to recall the word for it."

    That matches my first criterion above:

    "When I'm writing or speaking, I will say to myself or others, 'I'm thinking of a certain word', not 'I'm thinking of a certain syllable / morpheme / phoneme'."

  13. Victor Mair said,

    October 7, 2018 @ 6:46 am

    From Brian Spooner:

    Your post on word in Sinitic draws attention to something that is very interesting on a more general level. I have for a long time been thinking that the common understanding of language (in all societies that have a history of writing) relates in fact to written not to spoken language, and written language provides the standard for speech. The traditional or historical academic study of language (philology) was the study of written language. If I am remembering correctly, the study of spoken language, which evolved into linguistics as we know it today only in the latter part of the first half of the 20th century, started quite separately with the anthropological (and ethnographic) study of non-written languages in the 19th. Even now most people with an academic or scientific interest in language study either spoken or written rather than language in all its forms.

  14. Victor Mair said,

    October 7, 2018 @ 6:49 am

    Until we can express them in words, our thoughts are inchoate. And it is so wonderful to have precise words like "inchoate" to express our thoughts.

  15. R. Fenwick said,

    October 7, 2018 @ 8:56 am

    @V:
    I don't agree with that, though; I don't think in language and I find it strange some people claim to do so.

    For a start, the argument from personal incredulity is a logical fallacy. But that aside, as for the argument over the fundamental (or not) nature of the "word" in language, I've long been fascinated by an intriguing little passage from an Ubykh text narrated by an elder in the early 20th century:

    «jəsəratzạɬawnə wəlaχʷafawmət» awq’aq’ajt’ba, yənangʲạχʷan ʃəgʲə ʧawəjt’ma; t’qʷ’agʷəʧạq’awnə aʃəwq’awtʁạfan yənangʲạχʷan sạba ʃəfawdəblapɬaq’ay?
    "If you had said, 'you shall not pass by on this[, the] Sirat Bridge', we would not have become so bored; why did you make us wait like this in order to say it in two words for us?" (my emphasis)

    To the best of our knowledge the speaker of the text was illiterate, and certainly he was illiterate in Ubykh, which does not have and has never had a native writing system. Why, then, should he have any idea what constitutes a "word" in the language? Why should he consider a single utterance worth dividing into multiple subdivisions at all? And how should he have had any clue how many boundaries should appear in a sequence of at least nine non-segmentable morphemes (in the sequence «jə-sərat-zạɬ[a]-awnə wə-la-χʷa-f[a]-awmət»)? Now, I'm aware that anecdotes are not data and don't wish to engage in hasty generalisation. But this example does suggest that (in Ubykh at least, and therefore presumably in other human languages too) that at least some language users do perceive and define meaningful subdivisions in utterances, subdivisions that are innocent of influence from writing systems.

  16. Martha said,

    October 7, 2018 @ 9:34 am

    Tim Leonard: "Isn't what constitutes a word determined by the grammar? That is, a word in a grammatical sentence can be replaced by a different word of the same grammatical category and the result will usually be grammatical, though not necessarily meaningful. The same cannot be said of syllables, morphemes, or phonemes."

    Can't it be said, though? If I take the first syllable of "twinkle," and replace it with a different syllable so I get "smankle," the result is grammatical (in that it follows the phonological rules of English), although it is not meaningful.

    (Also, I agree with Victor Mair's statements.)

  17. David L said,

    October 7, 2018 @ 10:12 am

    Until we can express them in words, our thoughts are inchoate.

    I disagree. What about a composer creating a complex musical work? An architect imagining a detailed plan for a building? A sculptor perceiving a finished figure inside a block of marble? In such cases, the creator can put together a specific and complete work in their head without the need for words.

  18. FM said,

    October 7, 2018 @ 10:42 am

    My housemate and I are both mathematicians, and for both of us thinking about math involves scribbling on a piece of paper. But for her, the scribbling almost always resolves into a kind of verbal diarrhea composed of sentences; my scribbling comes in the form of formulas, doodles, and diagrams with arrows, and almost never sentences.

    This seems to support the notion that people's thoughts are verbal to different degrees.

  19. Linda Seebach said,

    October 7, 2018 @ 10:59 am

    Temple Grandin (who is autistic) titled her autobiography "Thinking in Pictures." She's not "nonverbal" (I had the privilege of interviewing her once) but she puts things into words after she has thought them. I mentioned that I sometimes dream in three columns of justified type and she thought that was very funny.

  20. Jon said,

    October 7, 2018 @ 12:48 pm

    Putting our thoughts into words that accurately convey to others what we are thinking is a skill that we learn. Like other skills, when we become proficient it may be so automatic that we don't notice it.
    But it is also true that putting our thoughts into words can clarify our thoughts, and let us spot holes in our arguments.
    Someone once said that writing an article on a subject lets you know how well you understand the subject. And writing a computer program to implement an idea is a severer test still.

  21. Victor Mair said,

    October 7, 2018 @ 1:15 pm

    Most times I think in English, but often I think in Mandarin, and occasionally in Nepali or other languages. It all depends on what sorts of thoughts I'm having, and what language is most appropriate to those thoughts. Naturally, all my thinking is being done in the words of those languages. If it were not, how would I know what language I was thinking in?

  22. Ellen K. said,

    October 7, 2018 @ 3:00 pm

    Seems to me to what degree different people think in language is beside the point, as far as the main discussion here. The question is about if the word is a fundamental unit of language used in thinking when we think in language.

  23. Victor Mair said,

    October 7, 2018 @ 4:45 pm

    Forgot this post from early in the summer:

    "The importance of proper parsing and punctuation" (6/4/18)

    http://languagelog.ldc.upenn.edu/nll/?p=38606

  24. Philip Taylor said,

    October 7, 2018 @ 4:54 pm

    As a native speaker of British English, I am accustomed to thinking in terms of words, but I am not completely convinced that the same is true for all speakers of all languages, nor do I think that the latter can be taken as a given. I am also used to thinking in words (as opposed to "in terms of words"), at least when I am thinking consciously. But I am reasonably confident that not everyone thinks in words all of the time. One obvious example comes to mind. When an elderly person suffers from dementia, or when someone suffers a stroke but does not lose the power of speech, he or she may become desperate to communicate a need but be unable to do so because they cannot bring the necessary word to mind. They may (for example) know that want a physical object (a razor, a pen) or they may need help with some activity (shaving, writing), but they cannot find the word and their frustration is only too apparent. So I would argue that they have thought of their need, but not in terms of words, rather as an idea, a concept. And although such people are atypical, I nonetheless believe that if they can (and do) think other than in words, then so can (and perhaps do) we all.

  25. Victor Mair said,

    October 7, 2018 @ 7:00 pm

    But they are searching for words, and are frustrated at not being able to find them.

  26. Anthony said,

    October 7, 2018 @ 7:04 pm

    Writing something of no importance recently, I couldn't think of the appropriate word but knew I could express myself in several words. Wanting to use the single, precise word that I knew existed, I thought for a minute and came up with "vicarious." Granted, I was explicitly trying to write and not just to think, but before my realization my thought involved several words and with effort morphed into a single word, the mot juste.

  27. V said,

    October 7, 2018 @ 8:08 pm

    I think the question of whether the "basic unit of language" is the "word", and whether language is integral to thought are entirely separate questions. In my opinion the latter is untrue, unless you define "thought" very narrowly; the former is up for debate, and as I said, professor Mair has convinced me that there probably is such a thing as a word.

  28. Philip Taylor said,

    October 8, 2018 @ 6:07 am

    VHM ("But they are searching for words, and are frustrated at not being able to find them") — yes, searching for words in order to communicate their need, yet they have been able to identify that need (i.e., think of that need) without words (or least without the key word) — that is the point that I was seeking to make.

  29. ktschwarz said,

    October 8, 2018 @ 6:34 pm

    The Ubykh quote about "two words" is indeed very interesting. How did the transcriber know where to put spaces in the transcription?

    There are zillions of introductions that say that languages can be placed on a scale from analytic to polysynthetic, but don't say what makes a string of morphemes a "word", instead of a group of words. Why is Turkish evlerden "from the houses" one word, instead of two (evler den) or three (ev ler den)? Why is Avrupalılaştıramadıklarımız classified as one "word", instead of several? Turkish has vowel harmony, but (if I understand correctly) not all suffixes require it, so that's not enough to define a word. And other synthetic languages don't use vowel harmony, so how do they decide where a "word" stops?

  30. Chris Button said,

    October 8, 2018 @ 9:44 pm

    @ Antonio L. Banderas

    I unsuccessfully searched for an academic project dealing with a synchronic description, rather than etymological, of characters' "phoneticity"

    Could you possibly clarify what you mean by that with a couple of examples?

    Most importantly, therefore, a dictionary of the Chinese language must deal with what Prof. Duanmu's lexical 'elasticity/flexibility',

    What a great article! Thank you so much for linking to it! One quibble from a cursory reading is that I don't think the comparison of the trochaic (stressed-unstressed) compounds and iambic (unstressed-stressed) phrases with English is valid (I find that the distinction between say \blackbird and black \bird is nowadays better handled by descriptions assigning one stressed syllable to the former 'blackbird and two stressed syllables to the latter 'black 'bird with the falling tone being assigned to the last stressed syllable by rule).

  31. Phil H said,

    October 8, 2018 @ 9:51 pm

    I'm pretty skeptical about many of Prof Mair's arguments there, but to Prof Liberman's point, I'm not certain that the idea of a base unit of language can't be defined.

    I think I'd approach it something like this: what is the minimum kind of unit necessary for a language (or maybe even communicative) system? I think all the simple models of language systems I've seen have roughly two things in them: some meaning carriers, and some ways of combining the meaning carriers (words and grammar). Everything else seems a bit contingent – you might have morphemes into which the words can be broken, or you might not; you might have phonemes, but in the simplest systems you wouldn't need them, and in complex systems they are often unrecognised and do not exist outside of the "meaning carriers"/words.

    These "meaning carriers" don't necessarily correspond exactly to words as they are defined in modern dictionaries, but I feel like they're closer to words than to anything else.

    That's all a bit handwavy, and I don't know if it can be made precise enough to stand up. But looking at things like (a) the evolution of language and (b) simplified models (in formal grammars or computer languages), it doesn't seem to me to be too incoherent to ask: What are the fundamental units that these things are made of? What came first in evolution? What can a language model not do without? (Distinct but plausibly related questions.)

    That said, I disagree with several of Prof Mair's reasons. For example, his first reason:
    When I'm writing or speaking, I will say to myself or others, "I'm thinking of a certain word", not "I'm thinking of a certain syllable / morpheme / phoneme".

    But in China, it's overwhelmingly common to say things like, "我在想一个字“. This kind of phenomenon just seems too culturally mediated to be a useful indicator.

  32. V said,

    October 9, 2018 @ 12:10 am

    @Andreas Johansson
    I can think in language but it does not feel natural. I feel like translating. And I like translating. But not all the time.

  33. liuyao said,

    October 9, 2018 @ 1:39 am

    MYL’s comment to a comment actually helps with the OP’s claim. The “basic unit”, I think, was deliberately coined in contrast with the fundamental (or indivisible) unit, and in the case of matter one could make a good case for molecules. As Richard Feynman liked to say, if civilization were to end and we could pass down one scientific statement to future intelligent beings, it would be that all matter is made of atoms (though he meant mostly molecules; in case of crystals, we could call each atom a molecule), because there is a whole lot one could explain with it, irrespective of the deeper theory of the inside of the nucleus. Atoms would be like morphemes, of course.

    There are different degrees in which the (spoken) word is reflected in the written word (or rather in the word spacings), which calls to question whether such thing is equally natural in all languages. English may have a higher degree of agreement, but even there you have words like “have to” and “used to”. (You can almost see what constitute a word by watching a kid write.) German would be worse: those super long words are not actually words. It’s bordering on Chinese, Korean, and Vietnamese, which consist of equally spaced “zi”, for lack of a precise English term.

    Ultimately what VHM meant by the basic unit being the word has a lot to do with how our brain processes (hearing) language.
    A tentative definition (somewhat useless one): A word is the unit that the brain uses to retrieve information with, or equivalently to store information. By that standard, perhaps small (grammatical) words that “string” other words into sentences don’t count as words.

  34. R. Fenwick said,

    October 9, 2018 @ 3:02 am

    @ktschwarz:

    The Ubykh quote about "two words" is indeed very interesting. How did the transcriber know where to put spaces in the transcription?

    Isupposethequestionisvalidofanylanguage. :)

    There's a pile of criteria that are used to define a "word" (grammatical, orthographic, phonological), but for a grammatical word one of the most usual criteria is syntactic unitariness; basically anything is a grammatical word if it can't be broken up by another word and if it must be moved or replaced as a unit in different sentence forms. So you can say I want purple shoes, and rearrange it to say The shoes that I want are purple, and break it up to say I want purple ballet shoes; but while you can say I want ice cream, you can't rearrange it to say *The cream that I want is ice, or break it up to say *I want ice vanilla cream. That shows that ice cream, although orthographically two words, is actually a single word for grammatical purposes. For heavily agglutinative languages it's sometimes a little more complicated to test for, but fundamentally the same. With your example, for instance, you can say Evlerden geldik "we came from our houses", but (to concoct an example) you can't just move the evler portion on its own: *Den geldiğimiz evlerdir "it is the houses that we came from".

    In Ubykh, the morphemic richness can make testing for wordhood an apparent nightmare. A verb word can comprise ten or more overt morphemes with no trouble at all (ʃə-Ø-baʨ’a-ʁa-la-χʷa-f[a]-ạ́y-q’a-na-ma "we were no longer able to pass through underneath it") and an entire relative clause can form a single morphological word for the purposes of nominal morphology (since the demonstrative determiner jə- is directly prefixed to a noun stem, and the oblique-case marker –n suffixed to it, functionally jə́-qˁapˁ’ə-qʷˁạmˁa-də́-wa-mə-ɬ-ʁʷənə́-n "to this tree upon which there is [neither] branch [nor] knot" comprises a single grammatical word). But the order of elements within the grammatical word is firmly templatic (complex stems like this notwithstanding), incorporation is minimal, and defining where the word starts and ends is in practice quite easy, especially when compared to heavily incorporating languages like Mohawk or Inuktitut. I read a paper some years ago by David Rood (who worked on the polysynthetic and incorporating Wichita) in which he quite candidly admitted that the cues for determining wordhood in Wichita had still not been satisfactorily established.

  35. ktschwarz said,

    October 9, 2018 @ 7:00 pm

    @R. Fenwick, thank you, thank you! Now I feel like I've learned something new, especially with such good English examples to hang my intuition on.

    I'd count poetry as evidence of units in language. Words don't break across lines*, and songs and poems often go back before writing and are transmitted orally independent of writing. Is there metrical poetry in Ubykh, Inuktitut and other highly synthetic languages? Georgian is agglutinative and has a major heritage of poetry.

    *I barely managed to find any counterexamples in English. Tom Lehrer had one.

  36. R. Fenwick said,

    October 9, 2018 @ 10:50 pm

    @ktschwarz, you're quite welcome!

    I'd count poetry as evidence of units in language. Words don't break across lines*, and songs and poems often go back before writing and are transmitted orally independent of writing. Is there metrical poetry in Ubykh, Inuktitut and other highly synthetic languages? Georgian is agglutinative and has a major heritage of poetry.

    Ooh, poetry is a brilliant example of word-awareness in action, and one I hadn't considered. Georgian's really the only heavily agglutinative language I can think of with a strong tradition of metrical poetry, but that's due only to the limits of my knowledge of literature in other such languages. Only a single Ubykh traditional song text has been recorded, but it's in almost pure octosyllables, and there's no breaking across lines in it either:

    Wəʧabʁʲasən wəgʲəsətʷ’q’ayt’,
    wəʦanə wgʲətʷ’ə wfasəɬq’ayt’,
    Gʷəndạnəɕʷan wəbʁʲawadyayt’!

    "I sent you off, mounted on your horse,
    I had hung your lance and sword from you,
    Oh, you have died for fair Gunda!"

    ʁaqʷazạqˁa dəwadyạyq’a,
    ʁaʂa ʁalakʲ’ dəfaʧ’ạyq’a,
    ʁat’qʷ’aʤạgʲa dəɕaɕạyq’a…

    "She has lost her only son,
    She tore at the hair of her head,
    She beat at the flesh of her thighs."

  37. ~flow said,

    October 10, 2018 @ 5:44 am

    just found this: https://66.media.tumblr.com/56e586f73cde5ee15c0d56b0d9d2a384/tumblr_pduczzdRTX1vwl1l5o1_500.jpg

    trying to include this here, let's see whether it works:

    Is there a styleguide on comment formatting?

RSS feed for comments on this post