Vietnamese polysyllabism

« previous post | next post »

There is a movement called Vietnamese2020 that aims to substantially reform the writing system by the year 2020.  The main change would be to group syllables into words.  As the advocates of this change point out, most words in Vietnamese are disyllabic (the same is true of Mandarin).  The proponents of the reform believe that, among others, it would reap the following benefits:

1. achieve greater compatibility with the needs of information processing systems

2. comport better with the findings of cognitive science

3. put the kibosh on the false notion of monosyllabism, which they say is unnatural and does not exist in real languages

I myself had these additional thoughts:

1. Would the adoption of polysyllabism (i.e., linking of syllables into words) in Vietnamese obviate the need for so many diacritics (i.e., reduce homonymy)?  Without knowing the precise details of Vietnamese romanization, the plethora of diacritical marks has always led me to suspect that the script may be fraught with redundancy and overspecification, especially if the basic unit of grammar were taken to be the word rather than the syllable. The fact that many Vietnamese in their casual writing omit the diacriticals and are still able to make themselves understood (see below) underscores this possibility.

2. Would the adoption of polysyllabism make indexing, dictionary compilation, etc. easier and more user-friendly?  This has certainly been the case with Romanized Chinese and Japanese (e.g., in dictionaries and encyclopedias arranged according to alphabetical order by words), and I suspect that the same would be true of Korean as well.

I ran these proposals and ideas by a number of Western specialists in Vietnamese language and culture.  Their reactions were, to put it mildly, unenthusiastic.

Bill Hannas notes that this sort of proposal has been around for a few decades at least, and that the following line in the proposal does not offer much hope for adoption:  "In practice, while awaiting official orthography guidelines, hopefully, from a governmental body such as a national language academy, …"

Eric Henry states:

This is the first time I ever encountered this proposal. The article doesn't make it clear whether this idea has any government backing or not. To me the idea of pretending that Vietnamese compound expressions are unitary words in the same sense that "asparagus" or "daffodil" are words seems silly and artificial. The Vietnamese used to use hyphens to accomplish the same purpose; thus fangfa 方法 ("method") was "phương-pháp," and so on. Then people discovered that they could get along fine without hyphens, and that the absence of hyphens gave the page a pleasantly uncluttered look. Conjoining syllables in the manner proposed seems to me a way of reverting to hyphens [VHM:  without the hyphens]. But then it's natural to be attached to whatever one is habituated to—and I happen to be habituated to un-conjoined syllables.

To which I replied, "ex cept in Eng lish".

Eric continued:

I don't see how polysyllabism could reduce the need for diacritics. Vietnamese people of course write to each other all the time with no diacritics and can still figure out 98% of the text, but everyone knows and feels that this is just a makeshift. It would perhaps be nice to eliminate the need for the circumflex and the half moon by inventing a few special vowel signs—but I don't see how the tone marks themselves could be represented in spelling (cf., for comparison, luomazi [National Romanization for Mandarin]: han, harn, haan, hann)—that would just be a nuisance, especially since Vietnamese has, not four, but six tones. Vietnamese orthography has already (i.e., centuries ago) made a move in the direction of new vowel symbols with the letters "ư" and "ơ."

Maybe a Vietnamese equivalent of DeFrancis's ABC Chinese dictionary could be created. It might be wonderfully useful for some purposes, as the ABC dictionary is wonderfully useful for some purposes. But I haven't really thought this through.

Another correspondent replied:

This has nothing to do with the government. It looks to me like it's the work of some overseas Vietnamese linguistics grad student or (former grad student) who has now gone slightly crazy because of the "East Sea/South China Sea/Really Far South Mongolian Sea. . ." issue.

The author has several pages. Another one (hocthuat.org) has a long study that argues for the linguistic connections between Vietnamese and Chinese, but it now has the following disclaimer:

STATEMENT OF RENUNCIATION OF THE SINITIC CAMP

Here comes a painful decision. I would like to renounce my long standing belief in what I have elaborated in this electronic publication about Sinitic Vietnamese. That is to say, I no longer believe in what I used to see as vestiges of sinitic linguistic elements in Vietnamese vocabulary stock that are postulated in my research paper. The reason for my taking this course of action is, admittedly, politically motivated because I do not want my work later to serve for unforeseen evil purposes, especially in the face of Chinazi's overt actions trying to impose its hegemonism onto today's Vietnam. My blood is boiling with revulsion and hatred after seeing a series of unrolling events currently taking place in the East Vietnam Sea. Civilized people mostly see that those behaviors could only be committed by warmongers, descendants of those same savages as vividly and accurately described in "The Ugly Chinaman" 醜陋的中國人 by Bo Yang 柏楊. Don't take me wrong, though both matters not related, given the fact that my blood is genetically embedded with Chinese DNA.

For Heaven's sake, please forgive me for all what I have been laboring on hitherto. I would appreciate your understanding and ask that you take this unstate [sic] moment of truthfulness as a statement of my renunciation of the sinitic camp and I shall accept all consequences thereof. My apology to my fellow scholars, too, and yet, if you still need to read my writings for some reason, focus instead on the antithesis of what is discussed herein, that is, "de-sinitize" them by taking the opposite view. You may still quote any material in this paper but remember to annotate your citation with this statement accordingly. You could post your comments and questions on Ziendan TiengViet.

It so happens that another language movement in Vietnam going on right now is called English2020; it aims to make all school leavers proficient in English by that year.

Steve O'Harrow comments:

There is an "English 2020" project being spearheaded by Professor Nguyen Ngoc Nhung on behalf of the SRVN Ministry of Education & Training that aims to make English language instruction available in a broad range of fields at the secondary and tertiary levels [by 2020]. It is the only domestic national-level language-related initiative I know of at this time in Viet Nam. One might be forgiven for suspecting that the proposers of the Vietnamese2020 movement stole the name "2020" from the Ministry of Education & Training English initiative.

The article you link here looks rather "iffy," to say the least. In reality, it is probably a scheme put on line by some Viet Kieu ["overseas Vietnamese"] someplace outside of the country itself. In my opinion, after my 50 years of Vietnamese language teaching and research in Viet Nam, Europe and America, there is a zero chance of this spelling movement taking hold. Why? Because the current system works well. It is known and used by nearly 90 million people.

The Vietnamese populace is already one of the most literate in Southeast Asia and it has been literate for a very long time. They are not likely to change what works well.

"If it ain't broke, don't fix it."  And believe me, they won't.

What is endlessly interesting to this observer over the years is that for a long time now, the handful of folks who identify themselves as Vietnamese but who live overseas, are of the impression that what they cook up in the cafés of Paris or the campuses of the USA is going to have some magic impact on the millions and millions of Vietnamese who are actually living their day-to-day lives in Viet Nam itself. There are all kinds of looney ex-pats out there and each one has a fantastic plot to do something, reform the language, overthrow the government, invent a perpetual motion machine that serves pho on the side. They're constantly going around appointing each other prime minister of governments in exile or re-claiming the Nguyen Dynasty throne. Mind you, founding a new goofy religion actually works sometimes – as long as you are really in Viet Nam, that is.

But if you are abroad, "fuhged-daboudit," [especially if you live in Brooklyn].

Responding to my technical questions about the possible value of a polysllabic approach to Vietnamese writing, Steve remarked:

Short answer: NO  Longer answer: I really do not know enough about the
technology of information processing, etc. to be 100% sure and I do know
that many Vietnamese disagree on which words are polysyllabic & which
are not [Chinese loans are easier to judge, but Mon-Khmer vocabulary is
another question and mixed lexemes are even fuzzier]. The main obstacle
to information processing at this point in time seems to be the fact that we
do not have decent optical character recognition programs, due to a lack
of typographic consistency and the fact that Vietnamese printing in the past
has been all over the map. However, none of the "fixes" will eliminate the
need for the diacritics and there is a lot of misunderstanding among those
folks who do not actually read/speak Vietnamese which marks are diacritical
[only the five tone marks] and which are integral parts of letters [hooks, bars,
and circumflexes]. A Vietnamese native speaker does not see, say, the
letters "o" and "ô" or "e" and "ê" as being "o with / without a circumflex" or "e with / without a circumflex" – rather s/he conceives of them simply as completely distinct
letters, as different as we would think of "e" and "o" in English. The folks
whom this system confuses are mainly foreigners, so who gives a damn?

A 2nd point would be that there is a lot of disagreement on what constitutes
a "word" in Vietnamese. Is "Không quân" [Airforce] one or two words? I
really don't think we are going to come to any substantial agreement in the
foreseeable future and I really don't think it matters a whole helluva lot, at
least not to the Vietnamese reading public

Again, the main point is that the current Vietnamese writing system
works well for Vietnamese people in Viet Nam itself, so any substantial
changes would likely be counter-productive. Just remember the old US
saying:  IF IT AIN'T BROKE, DON"T FIX IT! – it is just as true in VN as
it is in the US.  Tinkers be damned.

Finally, just before I was about to make this post, I received these brilliant remarks from a Vietnamese specialist who wishes to remain anonymous:

If Vietnamese were written as words, and not as syllables, there would be less need for diacritics (tones and "special"–in the sense that they lack Western alphabet equivalents–letters) because an equivalent amount of information (cues) is provided by the word division.

By adding information up front of one sort, you get by with less information of another sort. Word division in orthography means that society and its individuals have invested resources in an upgraded system that rewards users with greater clarity for less effort. You put the effort in at the beginning–deciding the rules and learning them.

We don't specify every phonological detail in English writing because we don't need them to get to meaning.  The reader, if s/he cares about it, can supply those details later, after accessing the word-meaning.  Often an unambiguous pronunciation is possible only after the word has been retrieved from one's mental lexicon.  It surely does not derive from the successive letter-sounds.  By the same logic, written Vietnamese words would be overspecified if they included all the diacritics in use at present.

Because indicating tone in computerized writing is such a bother, Vietnamese usually just leave them out of their informal correspondence, such as emails.  The messages can still be understood, albeit with some difficulty.  Word division would restore the missing redundancy.

Information technology, and indexing in particular, depend on having "tokenized" units, usually at the word level.  Most of the tokenizing work is done already in languages with word division.  For CJV (not K), however, a tokenizing function is needed.

It all comes down to the same rule: you can pay the cost once up front (create and learn rules for word division) or in perpetual installments.

It is remarkable that, although Chinese, Japanese, Korean, and Vietnamese have four different writing systems, they all are vexed with the problem of whether or not to join syllables into words.  That, I believe, is the result of the latter three still retaining vestigial traces or influences of the Chinese characters.  But even character writing could adopt word spacing if enough of its users would agree to follow such a norm.

[A tip of the hat to Jonathan Smith and thanks to Liam Kelley and Michele Thompson]



45 Comments

  1. J.W. Brewer said,

    October 2, 2012 @ 3:20 pm

    For Chinese and Japanese, you may be characterizing the issue backwards – what is going on is not so much breaks between syllables rather than between words, but no information-conveying breaks at all except at the end of sentences and thus no visible distinction betweeen single-character (or single-syllable) words and multiple-character (or multiple-syllable) words, although in Japanese many individual kanji of course have polysyllabic readings. That lack of information-conveying breaks was once common practice for texts written in our alphabet, but was abandoned in favor of inserting blank spaces at word-breaks in the latter part of the first millenium A.D. http://en.wikipedia.org/wiki/Scriptio_continua. The Vietnamese situation may be different altogether.

  2. Sili said,

    October 2, 2012 @ 4:21 pm

    Really Far South Mongolian Sea

    This should probably not amuse me as much as it does.

    I award the the author a swimming holiday to Austria.

  3. JS said,

    October 2, 2012 @ 4:31 pm

    ^
    Chinese writing certainly provides "breaks between syllables" in the sense that the salient written units, characters, map (almost) without exception to single syllables of speech; the addition of physical "blank space" as that called upon to separate English words would, of course, be redundant.

    However, Korean orthographical standards do call for word separation, meaning that in the case of (standard) Korean writing, both the syllable and the word are strongly marked in written text — though as one might expect, there are in the case of the word many cases in which decisions regarding division are variable and arbitrary.

  4. Peter said,

    October 2, 2012 @ 5:13 pm

    ^ I agree that characters neatly (and nearly without exception) subdivide an expression into syllables. Having blank spaces between the _words_ though–that would not be redundant. It would be kind of helpful (but no one will ever do it).

  5. Victor Mair said,

    October 2, 2012 @ 5:28 pm

    @Peter

    "that would not be redundant" — clear thinking on your part

    "but no one will ever do it" — actually, a lot of people have done it (e.g., Chow Tse-tsung and Apollo Wu). Who knows? Someday it might just catch on. That would be a boon for IT specialists, dictionary makers, indexers, grammarians, and sundry others.

  6. Peter said,

    October 2, 2012 @ 5:52 pm

    @Victor

    That would be convenient. Considering that most Chinese (or, I suppose, Americans) can't tell the difference between a morpheme and a word, I'm not holding out a great deal of hope.

  7. Ellen K. said,

    October 2, 2012 @ 5:57 pm

    In English we have cases where whether something is a word or two is somewhat arbitrary, and even cases where we don't agree on if it's one word or two. This doesn't seem to get in the way of our use of the written language. Curious that none of the writers, all writing in English, mention that we have this in English. My question for them would be, is this any different from English, other than that in English we've had time to standardize many of the cases that can go either way?

  8. Victor Mair said,

    October 2, 2012 @ 6:14 pm

    @Peter

    Most Americans (and other speakers of English) know what a word is (i.e., know where to put spaces between words) — in 99+% of the cases. Otherwise we wouldn't be able to hold these conversations on Language Log. And you can be sure that commenters would jump down the throats of us bloggersifweforgottoputinthosespaces.

    As for what a morpheme is, that's specialized knowledge that can be left to linguists and others who delight in the study of languages.

  9. tram said,

    October 2, 2012 @ 7:39 pm

    Funny example. Is "Airforce" one or two words?

  10. Ruben Polo-Sherk said,

    October 2, 2012 @ 7:56 pm

    I think in understanding this issue it's important to realize that, just like with Chinese (when it is divided), the compounds aren't really being divided into syllables–they're being divided into morphemes, and that they simultaneously get divided into syllables is just a coincidence.

  11. JS said,

    October 2, 2012 @ 8:47 pm

    ^ Hmmm… from a synchronic point of view it might be possible to claim that in Chinese and Vietnamese writing, compounds are being divided into syllables and that it is the correspondence of those syllables to morphemes which is only a coincidence… after all, in these two cases, the salient written unit's relationship to the syllable is (all but) invariant, while its relationship to the morpheme is much confounded by the significant and increasing number of morphemes that are longer than one syllable.

    However, historically speaking, your view is reasonable as the preference for disyllabic "compound" words in both languages (which seems to have followed on processes of reduction of longer and otherwise more phonologically complex words to CV[C]?) means that the relationship between originally logographic Chinese characters and modern-day morphemes is indeed in some sense original and essential…

  12. Brad said,

    October 2, 2012 @ 9:15 pm

    I think one of the non-English rebuttals should be:
    So everyone needs to deal with the made up hassles of distinguishing between compound words, hyphenated compounds, and multi-word compounds?

    It's a distinction that the writing system makes, yet the organization system for the dictionaries resolutely ignores it. Does the meaning of 'air' change dramatically when followed by 'man'? If it does, you put 'airman' in the dictionary whether it's 'airman', 'air-man', or 'air man'.
    :-/

    Every Japanese book that I have that has spaces between the Japanese words is either a kids book or a Japanese as a foreign language text. The kids books have spaces between the words because uninterrupted strings of hiragana or katana can be difficult to parse quickly.

    And the native Japanese dictionaries intendend for children that I have get along just fine using Japanese kana ordering for the dictionary, so "alphabetizing" only benefits the people that have memorized the arbitary order of 26 letters instead of memorizing the arbitrary order of 52-some kana.

    In the written form used by adults, spaces would be redundant because the information is either conveyed through other indications:
    – grammatical particles indicating the end of the word
    – kanji interrupting the hiragana streams
    – punctuation
    and once someone gets into things like verb conjugation and so on, distinguishing between the various components really becomes quite arbitrary.

    All of the electronic dictionary work that I've done has involved looking up words using longest substring style lookup. So if X and Y are words, but someone also decided that XY is a word, you don't have to care. So if the electronic translation people need to build better word tables, that's not a very compelling argument to change tradition.

    In other words, God save us from yet another spelling reform, especially if it's for someone else's language.

  13. Ran Ari-Gur said,

    October 2, 2012 @ 9:29 pm

    @Ruben Polo-Sherk: I don't know Vietnamese, so please correct me if I'm being clueless, but — I don't think that's completely true. For example, the Vietnamese Wikipedia gives "London" as "Luân Đôn" — not, I submit, because it's composed of the morphemes "Luân" and "Đôn". (However, it also gives "Paris" as "Paris", and "Wikipedia" as "Wikipedia"; so there's definitely a tendency to write borrowed morphemes solid even when they're polysyllabic, but it competes with a tendency to write spaces between syllables even within polysyllabic morphemes.)

  14. michael farris said,

    October 3, 2012 @ 1:54 am

    Some initial random musings.

    There's a fair amount of variation in how borrowed morphemse (which have undergone Vietnamization) are written. If you take 'salad' I've seen all three:

    xa lát

    xa-lát

    xalát

    with the first being the most common.

    Words that don't undergo Vietnamization (like Paris) remain written as one word.

    Word division seems a thornier issue in Vietnamese that any other language I've examined. When I was actively learning Vietnamese there were times I could understand a sentence just fine but couldn't have hoped to divide it into words (or could think of a number of ways of doing so). Leaners of Thai and Khmer I've talked to report very similar experiences while learners of Mandarin mostly don't. It might be a SEAsia thing…

    Yes Vietnamese speakers can get by in some contexts without diacritics (I used to receive emails from one which I could understand pretty well) but this is partly due to diacritics being used most of the time – you can sort of 'see' the diacritics when they're not there. I'm also assuming there's some deliberate vocabulary and syntactic choices being made to facilitate understanding. But diacritic free Vietnamee (minus other massive changes) seems like a non-starter.

    IME unlike most writers of languages with diacritics, when a diacritic appears over a lower case i in Vietnamese speakers tend to write the dot and diacritic both (when writing by hand, in print the diacritic replaces the dot). I'm not sure what, if anything, this means, but it's sort of distinctive.

    You really should do a post on those Viet Kieu who want a return of Chu Nom (character based script). They make the word division (or other orthographic reform) plans seem completely feasible (nb I'm not talking about scholars who are interested in Chu Nom from an academic point of view who do very valuable work but those with half-baked plans for compulsory education and the like)

  15. Ruben Polo-Sherk said,

    October 3, 2012 @ 3:14 am

    JS, Ran Ari-Gur: Good point bringing up the polysyllabic morphemes.

    First of all, the polsyllabic morphemes in Chinese or Vietnamese are basically anomalies in one relevant sense: they cannot combine with other morphemes to form compounds in the way monosyllabic morphemes generally can. There are also very few of them.

    So it is not unreasonable to ignore them when figuring out how to transcribe Vietnamese, which has a large substructure of monosyllabic morphemes, and, because the importance of these monosyllabic morphemes, decide to simplify and standardize by making each syllable written separately, which is what they did with quoc ngu. And so that is how you get Lon Don. With foreign words, though, as michael farris said, it's not entirely standardized. The other exception is in cases like with the current featured article on Vietnamese wikipedia, which has "dreadnought" in it, which is clearly not written in quoc ngu–it's written English–and so isn't subject to the syllable-dividing rule.

    With pinyin, it's essentially the same. The substructure of Chinese consists almost entirely of monosyllabic morphemes and so, if someone decides to write with spaces to separate those morphemes, they may, for the sake of consistency, separate syllables of polysyllabic morphemes as well. But the motivation cannot be to distinguish syllables–that doesn't really make any sense, I think. If you argue that this is done to mimick the boundaries between Chinese characters, you get back to the point of morphemic structure, since a major function of Chinese characters is to support this kind of structure. It is possible, of course, to write a language like English, with no such structure, in Chinese characters, but the system of two-character compounds would not fit in general (and therefore there would really be no reason to not write each character separately if you transition from that into an alphabetic script). This is essentially an innate feature of the language, and not the writing system.

    So, to put it simply, when disyllabic morphemes are split, this is done basically to be consistent in a system that, in order to accomodate a substructure of monosyllabic morphemes, has been standardized (by convention or personal choice) to have spaces between syllables. The chief concern is the division between morphemes.

    (In case anyone doesn't understand what I mean by "substructure of monosyllabic morphemes", I'll explain it this way: Vietnamese and Chinese have it, and English doesn't. It's the thing that makes the issue of word division a real pain in the ass in the Vietnamese and Chinese, and not a problem at all in English. With Vietnamese and Chinese, because of the importance of the organization at the morpheme level, the concept of "word" doesn't fit well.)

    (Not really part of my argument, but maybe something to think about: We write "New York" with a space, but, though originally it was two morphemes, it is now really just one. So we do sort of have this in English, too.)

    (Ran Ari-Gur: I am not the all-knowing god of Vietnamese.)

  16. richard howland-bolton said,

    October 3, 2012 @ 6:02 am

    "ex cept in Eng lish"?
    "ex cept in Engl ish" surely :-)

  17. Victor Mair said,

    October 3, 2012 @ 6:31 am

    @richard howland-bolton

    surely not

  18. Ruben Polo-Sherk said,

    October 3, 2012 @ 6:44 am

    Shouldn't it be Eng glish?

  19. Gene Buckley said,

    October 3, 2012 @ 7:24 am

    Linguistically, compounds like air force are single words composed of other words: this is the beauty of hierarchical structure. Orthographies make different choices about how to handle that layered structure in writing. English is inconsistent, sometimes using a space, hyphen, or no division at all, often related to how familiar or "lexicalized" the compound is: water tower vs. waterfall.

    Spelling practice varies over time and space; hyphens used to be more common, and still are relatively more common in British than in American orthography. German, where these compounds have the same linguistic structure as in English, has a more consistent orthography, regularly writing compounds as one word (Wasserturm, Wasserfall) regardless of length; see this dramatic Afrikaans example, since it (like Dutch) follows the same practice.

    In Chinese, and therefore in Sino-Vietnamese, the compounds mainly at issue are closer to English per-mit, con-fer, and tele-phone. Because the meaning of the whole is often not very predictable from the meaning of the components, speakers shouldn't have much trouble learning to treat most such items as single written words, although there would no doubt be a role for (somewhat arbitrary) standardization. I think Victor's point is that to make no reference at all to word structure (whether by using spaces nowhere or everywhere) is to leave the reader completely on his or her own, when an orthography could give some significant information through the judicious use of spaces.

    It's another question whether further compounding should be written as a single word. Victor, as I take it, is mainly talking about the equivalent of per mit, although there will also be words like build ing that are semantically more transparent. Today Vietnamese writes the equivalent of build ing per mit. A writing reform that ended with building permit might be superior to buildingpermit, since the spaces show the relative grouping of (pairs of) morphemes where they do the most good, while still identifying the internal constituency of larger compounds. If Vietnamese and German represent the extremes, English orthography might for once actually be rather sensible, if only it were more consistent.

  20. Victor Mair said,

    October 3, 2012 @ 7:28 am

    That's why it's "English".

  21. Matt Anderson said,

    October 3, 2012 @ 7:55 am

    Ruben Polo-Sherk,

    Maybe I don't understand your point exactly, but, in Mandarin, polysyllabic words can certainly combine with other morphemes to form longer words. For example, húdié 蝴蝶 'butterfly' can combine with gǔ 骨 'bone' to form the word húdiégǔ 蝴蝶骨 'sphenoid'; xìbāo 細胞 'cell' can combine with zhì 質 'substance' to form xìbāozhì 細胞質 'cytoplasm'; and lǚyóu 旅遊 'tourism' can combine with qū 區 'district' to form lǚyóuqū 旅遊區 'tourist area'. &, while the individual syllables of xìbāo and lǚyóu can themselves be said to be morphemes, húdié is itself a single morpheme.

  22. Ruben Polo-Sherk said,

    October 3, 2012 @ 8:30 am

    Certainly polysyllabic words can combine with other morphemes. My point about the restriction on polysyllabic *morphemes* doing so was with regard to *how* they do it. The only way they can is basically through the same mechanism that we use to get "tennis racket" and "toaster oven". 蝴蝶骨 is basically "butterfly bone" in this same sense. There's an important difference between that sort of union and the one in, for example 理解, or 看见.

  23. M (was L) said,

    October 3, 2012 @ 9:37 am

    Does it make a lot of sense to bust a gut over foreign names and words? Every written language is challenged by this. Every language has to deal with it, and often by special localization rules that differ for each commonly-encountered foreign language. Often, it's a matter of "drop back ten and punt."

    It seems to me that whatever Vietnamese decides to do with Vietnamese vocabulary, and with loan-words that have become sufficiently adopted that they are now de facto Vietnamese vocabulary, is one question – – – but not a decision that ought to be driven by foreign words. Tail wagging the dog, no?

  24. Steve said,

    October 3, 2012 @ 11:57 am

    POINT ONE: The folks who worry about joining Vietnamese syllables or not joining Vietnamese syllables are in the same league with theologians worrying about how many angels can dance on the head of a pin. 90 million Vietnamese use an orthographic system that works well for them. In the early post-WW2 period, they undertook a massive literacy campaign that worked very well because, for a native speaker of Vietnamese, the writing system is not nearly as difficult to learn as say, the English system is for native speakers of English.
    POINT TWO: If one makes the axiomatic statement that "As the advocates of this change point out, most words in Vietnamese are disyllabic (the same is true of Mandarin)," one begs the question of what constitutes a "word." Many commentators appear to be judging whether and utterance in Vietnamese is a word based on whether what is expressed can be called "a (i.e., one) word" in, say, English or French. This is, in my opinion, a highly subjective stance.

    In any event, judging the matter as a non-native speaking student of both Vietnamese and Mandarin Chinese for the last 50+ years, it strikes me that the rate of apparent monosyllabicity in Vietnamese is much greater than in Mandarin Chinese – indeed, Vietnamese appears to have the highest rate of monosyllabicity and the lowest rate of phonemic redundancy of any language I have taken a scholarly interest in. For what it's worth…

  25. Steve said,

    October 3, 2012 @ 12:17 pm

    While this discussion is very interesting for us [and to me especially, since this is basic to what I have been doing every day for the past half century], it is rather meaningless from the point of view of the users of the Vietnamese writing system. It is very unlikely that any writing reforms will be instituted in the foreseeable future. They would cause more chaos that benefit. For example, if you look at Ho Chi Minh's manuscripts and other handwritten materials, you will see that he often liked to write "z" for "d" and "r" and "gi" – these are reflexions of the similar Northern pronunciation of the graphs in question [odd, since he spoke with a Central accent in day-to-day conversation]. Because of Ho's iconic status in much of Viet Nam [but clearly not all of Viet Nam], some true-believers have pushed the idea that the writing system should make the same substitution. However, there are other regions in Viet Nam where there is no "z" sound whatsoever and where "d" and "r" and "gi" do not represent the same sounds anyway. And there is even a very small part of the country where "d" and "r" and "gi" are pronounced as separate contrasting sounds.
    What this means is that one immediately begs political questions of national unity when one advocates writing reform of a system that is both universally employed [except in a few private spheres] and widely accepted from the Ca Mau peninsula to the Chinese border.
    So I come back to my sainted mother's old Indiana wisdom: "if it ain't broke, don't fix it!"

  26. Ran Ari-Gur said,

    October 3, 2012 @ 1:11 pm

    @M (was L): I don't think anyone is suggesting otherwise. I fear you might be refuting a straw man . . .

  27. michael farris said,

    October 3, 2012 @ 1:40 pm

    Apropos of what Steve has written it's important to note that Quoc Ngu is not a transcription of a particular dialect or language variety (which is still arguably the case for Pinyin) but an orthography that has slowly evolved to work for speakers of dialects with rather different phonemic inventories.

    Each distinction made in the script reflects a difference made somewhere (except for i and y as full syllables and for all I know somewhere does make that distinction) but nowhere makes all the distinctions (though a few dialects might come pretty close) and which differences are levelled varies from region to region (or village to village).

    It is not calculated to look appealing to westerners but it does a remarkably good job of providing a working unified orthography for the language.

  28. M (was L) said,

    October 3, 2012 @ 2:59 pm

    @Ran Ari-Gur – I was responding to the handwringing about Lon Don. What matters is how you write Hanoi in Vietnamese. How you write London or Paris or East Lansing doesn't really come into it except as a footnote.

  29. JS said,

    October 3, 2012 @ 3:28 pm

    Ruben Polo-Sherk:
    It would indeed be interesting if this were a principled distinction… but is noun compounding really "importantly different" from the sort of example you mention (li3jie3 理解, from two verbs in "parallel," or kan4jian4 看见, from two verbs in "series")? It seems possible that, historically, there simply haven't been enough disyllabic+monomorphemic verbs around to feed such processes… and such as have appeared more recently do get up to a certain amount of funny stuff, esp. of a "reduplicative" nature (e.g., lao1laodaodao 唠唠叨叨, shu3shuluoluo 数数落落, etc.)

  30. Jongseong said,

    October 3, 2012 @ 4:13 pm

    Korean has been written with spaces between words since at least the 1930s; before that, spacing depended largely on the author, and before that, spaces were not used.

    Spacing continues to vex Koreans, but this is largely due to the agglutinative morphology. For example, suffixes are supposed to be written without spaces and dependent nouns are supposed to be spaced, but Korean is full of cases where the same form can behave as a suffix or a dependent noun, as in daero 대로. As a suffix meaning "based on" or "following", you have beop-daero 법대로 ("following the law") with no space; as a dependent noun meaning "as", you have mal-han daero 말한 대로 ("as spoken") with space (I'm using the hyphen to separate morphemes in the romanization). Think of the confusion in English between "a while" and "awhile" or "maybe" and "may be", but much more frequent in the language.

    Compound nouns are another source of ambiguity, much as in English (which has the additional option of hyphenation to confuse matters further—"crybaby", "cry-baby", or "cry baby"?). Korean rules allow for optional spacing in many cases, which I guess is pragmatic.

    I'm less familiar with North Korean rules, but in general they use spaces quite a bit less than in South Korea. Compound nouns are generally written without spaces, and I think even dependent nouns may be written without spaces, so that the example above would be mal-han-daero 말한대로 in North Korean spelling.

    I don't think you could come up with a spacing rule for Korean that is at once simple and can satisfy everyone. However, for all the confusion about correct spacing, you wouldn't find anyone arguing for going back to no spaces between words. Korean is so much more readable with spaces. For what it's worth, Koreans don't have the confusion between syllables and words regarding their own language, though they have the advantage that polysyllabic morphemes are so common in Korean.

    Knowing next to nothing about Vietnamese and based on the simple fact that it is an isolating language with limited affixation, I would think spacing rules for Vietnamese would be simpler than for Korean.

  31. Ruben Polo-Sherk said,

    October 3, 2012 @ 4:39 pm

    The issue is semantic: Polysyllabic morphemes are independent in a way that the monosyllabic morphemes, when functioning as part of a compound, are not. They contain the entirety of the meaning. Now, even if it can be used independently, the monosyllabic morphemes, when they are serving to construct a compound, do not–the meaning of each is part of a large set of fundamental "nuts and bolts" that are put together to have meaning that can stand by itself. This fundamentalness is what I was talking about, and there are no (or at least trivially few) disyllabic morphemes in this group of fundamental ones.

  32. Matt said,

    October 3, 2012 @ 8:09 pm

    One interesting thing about spaces in Japanese kids' books is that they don't come between words and particles. So in kana it's "いぬが はなを" (dog-NOM flower-ACC) but in Romaji it's (usually) "inu ga hana o". (Although the Portuguese missionaries used the same separation as modern kana: "inuga fanauo".) One useful effect of adding spaces to Japanese orthography would be the provision of a final, by-fiat answer to what exactly constitutes a word in Japanese. (Tongue only partly in cheek.)

  33. wren ng thornton said,

    October 3, 2012 @ 8:27 pm

    @Ellen K:

    There are certainly ambiguous cases in English, but I think the issue is one of severity. Most of the English examples I can think of are ones where the compositional structure has been lost to us (e.g., "a lot", "after all") and we treat the set phrase as a single word. (The other examples are compound nouns, but German seems to do fine with eliminating the spaces there.) However, to pick Japanese as an example, because of its agglutinative nature the issue of distinguishing words is problematic even for productive structures.

    For example, Japanese uses a lot of verb compounding. This is vaguely similar to English's system of modal verbs, except that it's extremely productive instead of involving a closed set of forms. Depending on the verbs involved, these compounds could be (a) entirely compositional, (b) syntactically compositional but with non-compositional semantics, (c) semantically non-compositional to the point of being aspectual/affective markers, often with phonetic non-compositionality, or (d) non-compositional to the point that they are considered to be simple inflections rather than compounds. In the conventional romanization we treat most of (d) as single words; treat (a), (b), and the remainder of (d) as separate words; and waffle back and forth over (c). But because there's a continuum here —from clearly compositional processes through to tense/aspect/mood/polarity inflections— wherever you draw the line is going to be problematic.

    To pick another issue, in the traditional romanization we separate off case morphemes from their noun (etc). This is strange, but then there's a continuum between case morphemes and postpositions, so again there's this issue of where to draw the boundary (if indeed any boundary should be drawn). And this gets confounded into other issues too. For true adjectives, the morpheme converting them into adverbs is traditionally romanized as part of the same word. Whereas for adjectival nouns, the morpheme converting them into adverbs is traditionally written as a separate word (since it's related to the dative). And that morpheme coincides with one for converting verbal stems into adverbs, but for verbal stems people waffle back and forth about whether it should be separated or not. That morpheme is also a form of the copula, so surely you'd want to be consistent about how you treat the copula elsewhere right? Etc. Etc.

    If Vietnamese is at all similar, it's no wonder they settled on spaces between each morpheme/syllable. It's a bit extreme, but at least it's consistent, eh?

  34. Ran Ari-Gur said,

    October 3, 2012 @ 11:04 pm

    @M (was L): Re: "I was responding to the handwringing about Lon Don": I don't see how you can have been, seeing as there wasn't any . . .

  35. JS said,

    October 4, 2012 @ 12:01 am

    Ah… I am not clear on all points, but sense in your last comment a view of Chinese and Vietnamese word formation rather different from that which I have in mind: where I tend to think mostly about larger words formed from smaller words proper by a variety of processes (some of which might be properly called "compounding" and some not), it seems you view these languages as engaging in word-building from stores of (often bound) morphemes (the "nuts and bolts") in a more self-conscious manner — a la "classical" compounding in English, or novel unions of Sino-Japanese elements in Japanese?

    These two possibilities are not mutually exclusive, of course… but my tendency to see the latter sort of "compounding" as more exceptional and less interesting might be the reason I have been slow to appreciate your suggestion regarding the relative productivity of monosyllabic vs. disyllabic morphemes in compounds (a difference I suppose I might see as merely a reflection of the sorts of words available in the language at a given time.)

  36. Matt said,

    October 4, 2012 @ 12:31 am

    Also, part of role that Chinese characters play in Japanese orthography is indicating word division. The basic principle is that "A change from kana to kanji usually indicates that a new word has begun."

    人類社会のすべての構成員の固有の尊厳と平等で譲ることのできない権利とを承認することは、世界における自由、正義及び平和の基礎であるので、

    jinruishakainosubeteno koseiinno koyuno songento byodode yuzurukotonodekinai kenritoo shoninsurukotowa, sekainiokeru jiyu, seigioyobi heiwano kisodearunode…

    That simple rule above gets us about halfway to a working tokenizer — of the 12 "words" above, at least 6 or 7 are arguably "really words" if you accept the particles-are-part-of-the-word-they-follow argument. The lexicon needed to mop up the edge cases isn't unworkably enormous.

    Of course, this doesn't mean that kanji are necessary for Japanese writing to make sense (as harped on endlessly in other threads), as any shift to a kanji-free writing system would surely see the introduction of spacing as well. But in my opinion this is part of the reason why there is such resistance to ideas like "only write Sino-Japanese words with kanji; write the native vocabulary (like 譲る in the example above) in kana" — the arrangement of different types of characters conveys the same sort of information as whitespace, albeit less efficiently and unambiguously.

  37. Ruben Polo-Sherk said,

    October 4, 2012 @ 8:31 am

    JS: I'm sorry, but I'm not entirely sure what you're saying, so forgive me if I'm talking about something entirely different.

    Aren't these two types of compounding entirely different phenomena? The first one isn't really particular to Chinese, and doesn't have anything to do with the morphological substructure, so I left it out of my original post. In fact, my point was that these two-morpheme compounds are *different* from compounds like "tennis racket" (if that's what you mean by "'classical' compounding"?).

    Do you mean that you see the mechanism for establishing the meanings of two-(bound) morpheme compounds from their constituent parts as irregular to the point that you consider these compounds to be mostly "set" combinations, and therefore unitary?

    If so, I'll try to explain why I see it the way I described it.

    From my own experience learning them, I find that a many (a majority?) of compounds are understandable entirely from their constituent morphemes. More specifically, in the past, when I came across an unfamiliar compound, but knew each morpheme, I would be able to understand what that compound meant from my knowledge of those morphemes. In fact, there have been times when I wanted a particular word, but hadn't learned it yet, and was able to successfully "derive" it from morphemes that I already knew (If you want examples of the kind of compounds I derived, some I remember now are 区分、両日、根源、外面的、変容).

  38. JS said,

    October 4, 2012 @ 9:59 am

    ^ Thanks for your remarks. Basically I feel that compounding from bound morphemes in Chinese at least, while it certainly exists, is not terrifically productive — such words (dian4shi4 电视 and the like) smell more like our coinages from Greek/Latin roots (what I imprecisely called "classical" compounding) or the Sino-Japanese contribution to CJK (ke1xue2 科学). The examples you raised earlier (li3jie3 理解, kan4jian4 看见, myriad others) are instead in origin free-free syntactic adjacencies (the latter arguably still phrasal), and I see no reason in principle why polysyllabic morphemes couldn't wind up involved in such lexicalization processes. So this second is indeed the "tennis racket" category, though much richer in practice than such a designation might suggest.

    Incidentally, in neither case would I see the meanings of these Chinese "compounds" as generally transparent given their individual components, though the latter sort were at some point freely composed and thus are arguably so from time to time…

  39. Jason said,

    October 4, 2012 @ 11:26 am

    @ JS

    I think you are confusing compounds, which are similar to Germanic words in English (e.g. airport, kitchen table), and agglutination, which accounts for Greek and Latin words in English (e.g. deconstructionism). Mandarin, like English, employs both; however, compounding is by far the more productive form.

  40. Ran Ari-Gur said,

    October 4, 2012 @ 3:29 pm

    @Jason: By "coinages from Greek/Latin roots" or "'classical' compounds", I assume that JS is referring to words like "biology", "telescope", "interject", etc., where a single word is formed by compounding (?) two bound morphemes ("bio-" and "-ology", "tele-" and "-scope", "inter-" and "-ject", etc.). Lexically and semantically, they're very similar to compounds of free morphemes like "life science" and "distance viewer", and to verb-particle idioms like "throw in".

  41. JS said,

    October 4, 2012 @ 8:53 pm

    ^ So… yeah, dian4shi4 etc. strike me as "biology"-type words, built self-consciously from the nuts-and-bolts Ruben Polo-Sherk has referred to, while the core of the Mandarin lexicon consists more of "life science"-type words (though of course of very diverse phrasal origins, found across word classes, and often with constituents no longer free.)

    @Jason, not sure what you would want to call "agglutination" in Mandarin as distinct from "compounding"… perhaps -de suffixation to create "one who does X" meanings, -hua suffixation to create "ish"-ish meanings, and the like? In which case you would have processes limited in number but very productive indeed…

    Apologies if I've derailed discussion… to return to the point, I might say I've found it interesting that those with knowledge of Vietnamese language and writing seem to find the suggestion of word division so asinine. The situation surely can't be so different from that of Mandarin, where IN THEORY (this naturally being as far as the present discussion means to extend), word division would be a workable and an at least marginally useful orthographical device.

  42. Ruben Polo-Sherk said,

    October 4, 2012 @ 11:16 pm

    Ok, now I see what you're saying. I think that our disagreement comes from how we are viewing the processes involved in compounding for the "core" of the lexicon.

    We agree on the fact that polysyllabic morphemes can form part of "life science" compounds, but I am claiming that there is an important distinction between "life science"/"tennis racket"/蝴蝶骨 compounds and ones like 空間/変化. In the former, both parts are stand-alone, independent words, and you are using the life/tennis/butterfly to specify the kind of science/racket/bone. This is not the same construction involved in 空間 or 理解. (If anything, the former is a lot closer to the "biology" type in construction). Whether or not polysyllabic morphemes can involved in a particular process has, obviously, nothing to do with how many syllables they have; it has to do with the fact that, whatver the reason, all polysyllabic morphemes in Chinese are stand-alone, independent words, and not building blocks*.

    For the purposes of this discussion, I'm splitting compounds into three types (some of my examples are Chinese; others are Sino-Japanese, but the mechanism is the same):

    1) 电视, 化学, etc. These are essentially the same as "classical" compounds in English.

    2) tennis racket, 蝴蝶骨, etc. This exists in lots of languages and is unremarkable.

    3) 空間, 想要, 見解, 区分, 理解, 変化. This is what I mean by the core of the lexicon.

    *If you know of any exceptions, please let me know, but I maintain that they'd still be statistically rare enough to be irrelevant to my larger point.

  43. JS said,

    October 5, 2012 @ 11:06 pm

    ^ To be "compounds" at all, all items under your (2) as well as (3) must be lexemes in their own right, with transparency or lack thereof merely a function of time, among other factors, correct? Surely da4ren(2) 大人 ("descriptive" compound; currently 'adult' and formerly 'your honor', etc.) is no different from hu2die2gu3, with li3jie3 and others (though very often of entirely different first syntactic structure) distinct from these only due to gradual loss of transparency? So, my claim was only that polysyllabic morphemes, though relatively few in number, may also engage in such processes.

    I don't think we should speak of privileged "building blocks" in Mandarin aside from "suffixes" like -de, -jia, -men, etc., and arguably the bound forms on occasion exploited in your (1).

  44. Ruben Polo-Sherk said,

    October 7, 2012 @ 6:21 am

    It seems that we've been using arguments that assume one interpretation or the other on whether these morphemes are lexemes or not, basically arguing from inconsistent paradigms. It seems to me that you see every morpheme, with the exception of things like -学, 电视, and -的, as always functioning as a lexeme. In Sino-Japanese, that interpretation is absolutely untenable–there's no question that the compounds themselves are the lexemes, but in Chinese, it's not so clear. There's only a valid distinction between polysyllabic morphemes and monosyllabic ones (or, more precisely, between bound and free ones) if you *don't* see every morpheme as a lexeme (excepting the agglutinative ones). If things like 理解 are taken to be clearly two words instead of one, then there is, of course, no utility to having the concept of a core process for forming the vocabulary at all (but my earlier point would nevertheless be correct–then every element of the lexicon, with a few very rare exceptions, is still monosyllabic). I'm not going to try to convince you or anyone else that things like 理解 are actually unitary in Chinese, since I don't believe that myself: many tools for analyzing other languages (for example, the concepts of parts of speech and word boundaries) are not suitable for Chinese, and everything looks fuzzy when you look at it from those perspectives. I'll only conclude with an argument for transparency and compositionality of these compounds: suppose you know what 理論、 理解、 解説、説明、and 回答 mean; you can infer what the "meanings" (or parts of meanings) are represented by 理 and 解. And then if you see 解答 for the first time, you can understand it compositionally. (I'm not claiming that *every* compound works like this–there are many, of course, that are rather opaque–but I do think that the majority remain compositional.)

  45. Gpa said,

    October 14, 2012 @ 4:29 pm

    Vietnamese borrows mainly from Cantonese, which is a remnant from Middle Chinese, not Mandarin, which is a bunch of reduced sounds from Middle Chinese, so using Mandarin seems irrelevant. And using Japanese is more irrelevant. Most of the words in Japanese use ancient Chinese monosyllabic combinations with other monosyllabic words to form a disyllabic or polysyllabic word. Koreans due to their borrowing from Chinese, just like Japanese which borrows across the many varieties of Chinese dialects, so any Chinese dialect's original word is now not their own anymore. Basically, Japanese, Korean, and Vietnamese use the same method to convey Chinese disyllabism: Using approximate sounds via their devised writing systems, all via Chinese, to form the Chinese words, which might or might not sound like the original Chinese word anymore, due to Japanization, Koreanization and Vietnamization of these original Chinese words. 蝴蝶: 蝴 & 蝶 both mean "butterfly/butterflies", which are rarely separated to form other disyllabic / polysyllabic words in Chinese.

RSS feed for comments on this post