Words in Vietnamese

« previous post | next post »

In "Diacriticless Vietnamese on a sign in San Francisco" (9/30/18), we discussed the advisability of joining syllables into words or separating all syllables.  The ensuing string of comments revealed that there is a correlation between linking syllables and word spacing on the one hand and the necessity for diacritical marks on the other hand.

This prompted me to ask the following questions of several colleagues who are specialists on Vietnamese:

Roughly what percentage of Vietnamese lexemes (words) are monosyllabic? Disyllabic? Any trisyllabic or higher?

The average length of a word in Mandarin is almost exactly two syllables.

Can you think of examples in Vietnamese parsing where it would be clearer or more helpful to have the syllables of words joined together?

From Bill Hannas:

I don't have any stats on the morphology of Vietnamese words. I have argued that "words" (like phonemes) are artifacts of alphabetic orthographies that incorporate word division, the latter being a social convention that shapes one's internal representation of words as much as it reflects that representation. Since Vietnamese lacks this feature, measuring the percentage of mono- and multi-syllabic words would end up being pretty subjective.

I've seen experimental Vietnamese orthography that uses word division as a means of reducing the number of diacritics.

From Steve O'Harrow:

The big problem you run into is the definition of a "word" in Vietnamese. Using English or any Indo-European definition begs a lot of questions, both semantic and otherwise.  My best guess is that if you were to investigate a very large corpus of spoken Vietnamese [if it were even possible to do] you
would find a vast majority of utterances were made up of monosyllables.

Over the years, I have come to the conclusion that one of the reasons why Vietnamese is a rather difficult languages for foreigners to learn well is the presence of a huge number of phonetically minimal pairs and this in turn reveals not only the incredibly low level redundancy of the language, but also underlines its high monosyllabicity quotient.

No linguist that I know of has ever produced a universally valid definition of a "word" in Vietnamese, one that could not be successfully attacked as an avatar of an Indo-European referent. So, unless somebody can cut that knot, we will not be able to furnish a scientifically satisfactory answer to your query.

However, my impression, having professionally dealt with Vietnamese for more than half a century [and with a couple of degrees in Chinese], is that it is much more nearly monosyllabic than Chinese. It is also interesting to note that, at least in comparison to standard Mandarin, a Vietnamese speaker is much more apt to pronounce all the tones in a spoken utterance, whereas the typical Mandarin speaker will pronounce a fair proportion of syllables with a "neutral" tone. This is, in my opinion, due to the fact that tone is much more necessary for aural comprehension in Vietnamese, due to the presence of so many minimal pairs, which underscores the much higher number of monosyllables in Vietnamese than in nearly every other language I know of.

A second question you pose is about graphically joining [hyphenating] "words" in Vietnamese. That was tried for a long time in printed works in the South.  And it depended heavily on what any given writer thought constituted "a word" – it has since been almost totally dropped and, in any event, was not standard in the North at any time in the last 60 yrs or so.

Because Vietnamese basically has two sets of vocabulary, one indigenous and one taken from Chinese – the latter supplies many literary & technical terms while the former is the basis for most of what one might term "daily conversation" in the same way Teutonic terms are more common in spoken English than in written works where we find a lot of loans from Greek and Latin and Middle French, one could get the impression that bi-syllable "words" are numerous, but those "words" are quite often just Chinese loans.

There seems to be a trade-off among word division / spacing, diacritical marks, and inputting hardware and software.

In the final analysis, what is the value of the concept of word for natural language processing, lexicography, philosophy, and other areas of artificial intelligence and cognition?

Readings

"Sinographic memory in Vietnamese writing" (4/16/14)

"Vietnamese in Chinese and Nom characters" (5/28/13)

"Homophonophobia" (2/7/15)

"Sino-Vietnamese poster" (12/4/17) (note the joined syllables on the poster)

"Prolific code-switching in Vietnamese" (4/14/16)

"Words in Mandarin: twin kle twin kle lit tle star" (8/14/12)



34 Comments

  1. Chris Button said,

    October 2, 2018 @ 9:38 pm

    I have argued that "words" (like phonemes) are artifacts of alphabetic orthographies

    As regards "phonemes", it is so refreshing to hear of anyone supporting such a position. I believe Peter Ladefoged referred to it as the "phonemic conspiracy". This is of course why the vowels, or rather any consonant-vowel distinctions, disappear in favor of the "syllable" whenever one reconstructs a proto-language at a deep enough level without any preconceived notions of what it should be like (Proto-Indo-European and Proto-Sino-Tibetan, whether Old Chinese or Proto-Tibeto-Burman, are prime examples).

    As regards "words", I need more convincing. Surely the prosody of any spoken language is contingent on an innate knowledge of word boundaries that has nothing to do with alphabetic writing.

  2. Chas Belov said,

    October 2, 2018 @ 11:35 pm

    I can't speak for Vietnamese, but I was taught for Cantonese that a word was any minimal speech unit that could form a sentence all by itself. Whatever was left were called bound forms. Definitely not an English definition of a word.

  3. Matt_M said,

    October 3, 2018 @ 1:50 am

    @Chas Belov: but that definition of 'word' is exactly what I've been taught for English! (with 'sentence' being replaced by 'utterance', but surely that's how it works in Cantonese, too)

    How does English differ from Cantonese in that respect?

  4. Jenny Chu said,

    October 3, 2018 @ 3:26 am

    I am thinking about something: the tendency (especially in Vietnamese news media) to use syllable-level acronyms. Example: TNHH for Trách nhiệm Hữu hạn (limited liability, as in a company), UBND for Uỷ ban Nhân dân (People's Committee), etc. I've seen some amazing ones now and then – initials of 10 or more syllables all rammed together.

    Here's a random sample from today's news:
    https://news.zing.vn/diem-chuan-vao-lop-10-nhay-tu-46-len-50-5-sau-mot-dem-post856599.html

    Tối 29/6, sau khi Sở GD&ĐT Hà Nội công bố điểm chuẩn vào lớp 10 công lập, trường ngoài công lập THCS và THPT Tạ Quang Bửu, Hà Nội, phát đi thông báo mức điểm chuẩn vào lớp 10 học 2018 – 2019 là 46 điểm.

    Can this tell us anything about word structure? You can have a 2-, 3-, or 4-syllable acronym but I feel as if the most common are certainly in pairs.

  5. V said,

    October 3, 2018 @ 4:10 am

    All Vietnamese-Bulgarians I know insist on separating each morpheme/syllable. For example, it's always Viet Nam, not Vietnam. I don't know why exactly.

  6. V said,

    October 3, 2018 @ 4:15 am

    BTW, I've also been sceptical of the concept of a "word".

  7. Philip Taylor said,

    October 3, 2018 @ 5:04 am

    V — as we are discussing written Vietnamese rather than spoken, do your Vietnamese-Bulgarian informants actually write "Viet Nam" (as stated above) or do they in fact write "Việt Nam" ? I would use the latter but never the former, or the fully Anglicised "Vietnam" where more appropriate.

  8. V said,

    October 3, 2018 @ 5:44 am

    It's Виет Нам in Cyrillic, and "Việt Nam.

  9. Philip Taylor said,

    October 3, 2018 @ 5:58 am

    V — thank you, understood.

    Incidentally, when it comes to the concept/meaning of "a word", I think that the (IMHO, much under-rated) French philosopher d'un Petit* summed it up perfectly when he wrote "When I use a word1, it means just what I choose it to mean — neither more nor less". In the original MS he adds a footnote, sadly lost in the printed editions — "1. including, of course, the word 'word'."

    * Cited in Dodgson C L, Alice through the looking glass, Macmillan, London 1871, where he anglicises d'un Petit's name as "Humpty Dumpty", presumably fearing that his target audience might have had trouble reading d'un Petit's birth name "un Petit d'un Petit". Dodgson, a mathematician as well as an Anglican deacon, would have been well aware of the self-recursive definition of "word" that was clearly uppermost in d'un Petit's mind.

  10. David Marjanović said,

    October 3, 2018 @ 6:40 am

    Surely the prosody of any spoken language is contingent on an innate knowledge of word boundaries that has nothing to do with alphabetic writing.

    French doesn't seem to have phonological words. For instance, stress in French is prepausal – utterance-final, not word-final.

    (French does, of course, have morphological words. Vietnamese might not.)

  11. Bob Michael said,

    October 3, 2018 @ 8:14 am

    When Vietnamese, or another primarily monosyllabic language, is read or recited slowly, are any syllables grouped together, with less time between them? It might not indicate a word, but maybe give a clue about the multi-syllable units of meaning.

  12. Philip Taylor said,

    October 3, 2018 @ 8:30 am

    Bob M — I'd have to think carefully about Vietnamese, but in Mandarin Chinese two of my three Chinese teachers have monosyllabic family names and bisyllabic given names; the latter are pronounced together and separate from the family name :

    An Nuoya
    Zhou Shangzhi

    and in an utterance such as "Nǐde zhōngwén hěn hǎo", "Nǐde" and " zhōngwén" are usually run together whilst " hěn" and " hǎo" are clearly separate. Of course, whether Mandarin Chinese is a primarily monosyllabic language is moot …

  13. mg said,

    October 3, 2018 @ 12:33 pm

    David Marjanović said

    French doesn't seem to have phonological words. For instance, stress in French is prepausal – utterance-final, not word-final.

    Which is part of what makes auditory comprehension so difficult for non-native speakers. When I moved from a jr. high school with American teachers for French to a high school with native speakers, my ability to understand spoken language took a great hit for awhile. It wasn't the different accent, but the lack of clarity of word boundaries.

  14. ktschwarz said,

    October 3, 2018 @ 3:33 pm

    Thanks for this post, it's a question I've been wondering about. I've read that there's no general definition of "word" that applies to all languages, and also that there are agglutinative languages where the categories of "word" and "sentence" aren't necessarily distinct. However, I had assumed that each language at least had its own concept of "word", even if imperfect (we do have disputes in English about whether certain compounds are one word or two), and Professor Mair seems to back this up in the "twin kle twin kle lit tle star" link: "So far as I know, all the languages of the world have words as well as syllables and morphemes". But Bill Hannas says words are "artifacts of alphabetic orthographies", which I can't wrap my head around; does this mean that unwritten languages can't be decomposed into words?

    On the value of the concept of word: The first thing I can think of that doesn't depend on writing is metrical poetry. Words can't be broken across lines of poetry (unless you're deliberately going for an extremely silly effect), in any language I know. Is there metrical poetry in Vietnamese?

  15. Philip Taylor said,

    October 3, 2018 @ 4:12 pm

    Metrical poetry in Vietnamese ? I believe so. There are five examples (with translations) at https://www.dropbox.com/s/kj8u0rm24614fh9/Poems.pdf?dl=0

  16. Rick Rubenstein said,

    October 3, 2018 @ 5:29 pm

    d'un Petit? Cute, Philip, cute. :-)

  17. Jonathan D said,

    October 3, 2018 @ 6:17 pm

    Jenny, could you explain how acronyms would give any insight into word structure? I'm struggling to see how syllable-based acronyms tell us any more than the fact that Vietnamese is generally written as syllables.

  18. Chris Button said,

    October 3, 2018 @ 8:20 pm

    @ mg

    There is still the notion of the "prosodic word" though. The following example is taken from Jacqueline Vaissière's 2006 book on phonology:

    "bordures" [bɔʁdyːːʁ]
    "bords durs" [bɔːʁdyːːʁ]

  19. mg said,

    October 3, 2018 @ 10:45 pm

    @Chris – sure, the notion is there. But in actual conversations, you're unlikely to hear the difference unless the speaker goes out of their way to emphasize it. You'd usually just know from context, which gives you an anticipatory framework for hearing what's meant.

  20. Chas Belov said,

    October 4, 2018 @ 1:38 am

    @Matt_M: Utterance and sentence are not the same thing.

    With the warning that I am not fluent in Cantonese and that I'm going on the memory of classes taken 20 years ago, here's an example:

    mh (not) in Cantonese is a bound-form, not a word, because you can't just say mh in response to a question.

    Lahm mh lahm ga? (Is it blue?)
    Lahm. (It's blue.)
    Mhlahm. (It's not blue.)
    But not:
    *Mh. (Not.)

  21. David Marjanović said,

    October 4, 2018 @ 4:28 am

    There is still the notion of the "prosodic word" though. The following example is taken from Jacqueline Vaissière's 2006 book on phonology:

    "bordures" [bɔʁdyːːʁ]
    "bords durs" [bɔːʁdyːːʁ]

    What this really is is pronunciation so slow and clear that there's a pause between bords and durs, so that bords is an utterance of its own and gets to carry utterance-final stress, which in turn – if, again, the pronunciation is slow enough – causes the vowel lengthening before certain consonants that is transcribed here.

    You're not going to hear any of that in a conversation, or even in a political speech except if it makes dramatic pauses.

    …and dramatic pauses don't have to lie between morphological or lexical words either. They can lie between syllables.

  22. Chris Button said,

    October 4, 2018 @ 8:46 am

    @ mg and David Marjanović

    My understanding (not being a specialist on French in particular) is that rising intonation in French does not have to be assigned in a somewhat similar way to how a possible pitch accent on a stressed syllable in English is often not realized. However, the crucial difference is that in English the "stress" (which should not be conflated with any tonal adjustment) still remains to assist in the breaking up of speech. In French, there is no "stress" comparable to English so the recourse is lengthening which similarly does not necessarily need to be tied to any tonal adjustment (hence "bords durs" [bɔːʁdyːːʁ] and "bordures" [bɔʁdyːːʁ] can both be treated as equivalent phrases without any pause in the former but with a clear division in terms of prosodic words). However, to mg's point, a subtle length distinction (which I would assume is often not even applied) renders it difficult to break speech up for a non-native speaker. A language like Spanish on the other hand does have "stress" like in English but does not reduce the vowels so compensates by not ignoring the possible pitch accents on syllables to quite the same degree as English. When Spanish does ignore the pitch accents, it can also become difficult to break up solely on the basis of subtle length distinctions.

  23. ktschwarz said,

    October 4, 2018 @ 9:50 am

    I think @Matt_M was asking why "a word was any minimal speech unit that could form a sentence all by itself" doesn't work for English. For example: "the", "my", … there must be lots more little words that can't be a sentence (or utterance) by themselves. There are also many that can; here's an old Language Log about a sentence in the New Yorker that consists of "Z."

  24. Philip Taylor said,

    October 4, 2018 @ 11:13 am

    Well, both "the" and "my" can indeed be valid sentences in their own right if they are being used as metalanguage, as can any word. Teacher : "What does 't' 'h' 'e' spell ?" — Pupil : "The". "My" can be used as an interjection, particularly in <Am.E> (less so in <Br.E>). And many words (including "the") can be used in isolation when attempting to correct the speech of a non-native speaker.

  25. David Marjanović said,

    October 4, 2018 @ 5:04 pm

    In French, there is no "stress" comparable to English

    Phonetically, there is. It's just distributed differently: in English, every phonological word has one, and its position within the word is phonemic; in French, it's completely predictably prepausal.

    (Also, English has contrastive stress. French doesn't, and is happy to stress repeated information: de neu[ˌ]f heures à trei[ˈ]ze heures.)

  26. ktschwarz said,

    October 4, 2018 @ 5:49 pm

    The definition of word as "a form which may be uttered alone (with meaning)" was given by Leonard Bloomfield. Presumably meta-language doesn't count, otherwise everything is a word! In a few minutes of searching I couldn't find out whether Bloomfield considered where "the" stands in his definition, but did find tons of linguists afterward pointing out the issue.

    This American thought "my" as an interjection was more British than American — I may be wrong! Anyway, substitute "your" or "their" for "my".

  27. Chris Button said,

    October 4, 2018 @ 8:11 pm

    @ David Marjanović

    As I said earlier, my personal (and perhaps minority) view is that it is a common mistake in the analysis of any language to conflate "stress" with changes in tone/pitch. Rather, "stress" is something which tends to attract such changes, but it does not have to attract them (i.e. the attraction can be ignored) nor does it necessarily bear an exclusive relationship with them. I suppose you would analyze French as having "prosodic stress" (for which I would prefer not to apply the term "stress" at all) but not "lexical stress" (which I would just call "stress").

    In any case, the point of all this is that French prosody requires an awareness of word boundaries just like any other language (I assume), so I'm still struggling to understand why they should be considered "artifacts of alphabetic orthographies". Having said that, I would love to be convinced otherwise, because I totally accept the suggestion that the notion of underlying "phonemes" is utterly arbitrary. I made a long comment to that effect on an earlier LLog post:

    http://languagelog.ldc.upenn.edu/nll/?p=25730#comment-1513560

  28. Chris Button said,

    October 4, 2018 @ 10:07 pm

    I should probably add that "stress" in French that is comparable to English may be found in the role of schwa which, when pronounced as opposed to being dropped entirely (again comparable to degrees of reduction in English), cannot support any change in tone unless it is the sole unreduced vowel in a word.

  29. David Marjanović said,

    October 5, 2018 @ 4:28 am

    I suppose you would analyze French as having "prosodic stress" (for which I would prefer not to apply the term "stress" at all) but not "lexical stress" (which I would just call "stress").

    Oh, so all we disagree about is the terminology. Yes, French lacks lexical /phonemic stress, and English has it. I agree with your comparison of French stress to English schwa vs. syllabic resonants, which is likewise not phonemic.

    the point of all this is that French prosody requires an awareness of word boundaries just like any other language (I assume)

    …My point is that it does not. Prosody puts phonetic stress on the end of an utterance ( = before every pause), and often a high pitch on the beginning of an utterance. How that utterance can be divided into morphological, lexical or whatever words really is irrelevant.

    so I'm still struggling to understand why they should be considered "artifacts of alphabetic orthographies".

    This is really a claim only about phonological words; and even so, a majority of languages appears to have those independently of any orthographic conventions.

  30. David Marjanović said,

    October 5, 2018 @ 5:05 am

    Having said that, I would love to be convinced otherwise, because I totally accept the suggestion that the notion of underlying "phonemes" is utterly arbitrary. I made a long comment to that effect on an earlier LLog post:

    http://languagelog.ldc.upenn.edu/nll/?p=25730#comment-1513560

    That discussion, which I've now read, conflates several distinct issues, each interesting in its own right. I don't have time for them all right now. :-(

  31. Chris Button said,

    October 5, 2018 @ 9:31 am

    Oh, so all we disagree about is the terminology.

    I think it's more complicated than that. You're putting me in the camp of people who then go on to make the claim that French does not have stress. However, it clearly does in the role of schwa not being able to attract tone changes just like the case in English.

    Prosody puts phonetic stress on the end of an utterance ( = before every pause), and often a high pitch on the beginning of an utterance

    I'm afraid I don't see the relevance to the discussion at hand. Compare Vaissière's example of "bords durs" and "bordures" with Wells' example of "shellfish" and "selfish" where the speaker may also make a subtle length distinction (ignoring the slightly different onsets) which is similarly not contingent on any tone change. Although in English it is conditioned by different syllabification ("shell.fish" versus "self.ish") which does not pertain to the French example, the common thread is that in both languages speakers require an innate understanding of what constitutes a word.

  32. Ellen K. said,

    October 5, 2018 @ 1:32 pm

    Seems to me in English what we call a word is highly influenced by the writing system. Most of the time, what we refer to as a word is something written with no spaces. Of course, where we put spaces in our written language is historically related to something that is (was) part of spoken speech, and harder to define than defining a written word. And I can see how in a language where a word in the spoken language doesn't necessarily correspond to a unit in the written language it might be harder to grasp the concept of a word in the spoken language.

  33. Chris Button said,

    October 5, 2018 @ 8:51 pm

    "words" (like phonemes) are artifacts of alphabetic orthographies

    I think I might finally see the connection between "words" and phonemes that the author is getting at here – the scare quotes being operative…

    With abstract phonemes basically destroying the notion of intuitive syllables, it is no wonder that syllabic writing, rather than alphabetic, is far better at tapping into reader/speaker intuition. Hence there is no need for spaces since the aim is to reflect the continuous string of syllables as we speak. The alphabet destroys this notion of the syllable thereby rendering it difficult to read without spaces for which the logical place would then be between what are conceptualized as "words". Clearly defining what constitutes a "word" is fraught with difficulty, but accorded this primacy over the syllable solely as a result of the counter-intuitive orthography.

  34. Philip Taylor said,

    October 9, 2018 @ 1:45 pm

    I managed to pin my (Vietnamese) wife down long enough this week to be able to discuss the concept of "word" in Vietnamese, and this is a summary of what she reported (confirmed by a second native speaker just before posting) —

    The Vietnamese word that would most commonly be used to describe a word in the English language is "từ".

    The Vietnamese 'word' for "temporary" is "tạm thời".

    The Vietnamese word that would most commonly be used to describe "tạm thời" (or any similar multi-element 'word') is "từ", the same word that is used to describe a (single) word in the English language.

    The Vietnamese word that would be used to refer to either "tạm" or "thời" separately is "chữ" — "chữ" translates as both "character" and "syllable" in English.

    Thus (in summary) a Vietnamese 'word' can be what we might think of as a phrase.

RSS feed for comments on this post