Mutual Intelligibility of Sinitic Languages

« previous post | next post »

Nearly two decades ago I wrote a paper on terminological difficulties surrounding the classification of Sinitic languages entitled "What Is a Chinese 'Dialect/Topolect'?  Reflections on Some Key Sino-English Linguistic Terms," Sino-Platonic Papers, 29 (September, 1991), 1-31.  (Available online at http://www.sino-platonic.org/)  In that paper, I did not go deeply into the question of the utility of mutual intelligibility for determining the difference between a language and a dialect, mainly because it is a red hot can of worms, but also because people say such nonsensical things as that "A language is a dialect with an army and a navy."  Now, in preparation for updating my 1991 paper, I would like to revisit the matter of mutual intelligibility to see whether it can somehow be salvaged for purposes of taxonomic classification.


One thing is certain:  a monolingual speaker of Cantonese cannot understand a monolingual speaker of Mandarin and vice versa.  There is zero intelligibility between the two languages.  In fact, even within the huge collection of speech forms that fall under the umbrella of "Mandarin," there are many varieties that are more or less mutually unintelligible.  On July 4, 1987, I was climbing up Emei Mountain in Sichuan.  It was a hot, muggy day, and our small party (my wife, son, sister, and I) were struggling up the steep slopes.  We were astonished to see large groups of short, older ladies speeding upward.  As we listened to their chatter, we couldn't understand a word of what they were saying.  My wife, who grew up in Chengdu, and so speaks Chengdu Sichuanese (Szechwanese) –  generally considered to be a variety of Mandarin — suspected that the older ladies were speaking a non-Sinitic language.  When we inquired at the little shops along the way, we were informed that the groups of pilgrims were speaking one or another type of Sichuanese from nearby districts.  Mind you, Mount Emei is only 150 kilometers (93 miles) southwest of Chengdu City.  And even in Chengdu there are expressions that don't sound like Modern Standard Mandarin, such as MO DEI LO ("we don't have any" or "there isn't any"), LANGGE GAO DI ("what's going on?"), ZAGO ("how is it?"), and CHUIZI ("penis"; I'm not sure, but perhaps this originally meant "hammer").  (Forgive me for not recording the tones and sounds exactly right; these are just expressions I have picked up casually — I may not even have the meanings down perfectly.)

It is commonly claimed that there is only one "Chinese" language, and that all of the variants of that language are dialects of it.  This conception of there being only one "Chinese" language plays havoc with efforts to classify the countless varieties of Sinitic speech forms into meaningful groups, branches, languages, and dialects, as is normal for other large families or groups of languages.

The old canard that "when the dialects are written down they are the same" is simply untrue, since what gets written down are not the regional variants but standard Mandarin (and in earlier times Classical Chinese, a dead language for at least two thousand years).  If one, as a tour de force, does contrive to write unadulterated Cantonese or Taiwanese, for example, they will be as hard for a reader of Mandarin to understand as spoken Cantonese or Taiwanese is for a speaker of Mandarin to understand.

So the question is this:  how do we break through the taxonomic impasse presented by the claim that there is only one "Chinese" language for a billion or so speakers?  I'm hoping to establish a rational system of  classification whereby Sinitic is viewed either as a family or group of languages.  The jury is still very much out on whether Sinitic is part of a Sino-Tibetan, Tibeto-Burman, Austronesian, etc. family.  I believe that one of the main reasons for the failure to gain acceptance for any of the dozen or so external relationships that have been proposed for Sinitic is that the internal relationships of the family / group remain remain murky.

Share:



75 Comments »

  1. John S. said,

    March 6, 2009 @ 7:55 pm

    You may already have thought of this comparison, and probably discounted it as a little flawed, but I wonder to what extent it holds water:
    Could we imagine the label, and perhaps language, "Chinese" as being on par with Old Church Slavonic or Latin and their status as languages (or maybe go further back to "European"), and identify Mandarin, Cantonese, Hakka, Hokkien, etc. as being on par with Russian, Czech, Italian, and Spanish? I can imagine that these actually would be considered different languages if the DPRC were not one country, but many corresponding to the distribution of the "dialects" (Mandarin, Cantonese, etc.) and there wasn't some established lingua franca (like Mandarin) or a standardized writing system that basically contributes to a situation of diglossia (Simplified Chinese script).

    This idea is probably quite flawed if you actually know much about the Sinitic languages (which I don't), but maybe fleshing out why would lead to some constructive idea.

  2. Philip said,

    March 6, 2009 @ 8:26 pm

    Did I miss something? Why isn't mutual intelligibility a way to distinguish between a language and a dialect? Why is mutual intelligibility a "red hot can of worms"?

  3. Sky Onosson said,

    March 6, 2009 @ 8:30 pm

    I wonder why you think that "A language is a dialect with an army and a navy" is nonsensical?

    Surely, the status of "Chinese" as the language of a powerful nation (with an army and a navy!) has much to do with the "conception of there being only one 'Chinese' language [which] plays havoc with efforts to classify the countless varieties of Sinitic".

  4. kellen said,

    March 6, 2009 @ 9:08 pm

    I've been thinking about this mutual intelligibility thing for a while in regards to Chinese languages/dialects. I'm currently trying to catalog samples of different Wu dialects, being the language spoken around and west / south of Shanghai. Originally I figured they must all be pretty similar, until i got deeper into it and have found plenty of cases where something said in Shanghai can only be understood by someone in nearby Wuxi due to the prominence of Shanghainese within Wu. However those in Shanghai have little success in understanding someone speaking the Wuxi dialect, though they'd be much better off than someone from Beijing. However, with cognates / identical words, even someone from Beijing (or in my case Michigan and having some Mandarin under my belt) would be able to understand a bit of Wu if it were spoken clear enough. There are plenty of times where I can understand what's being spoken around me (in Wu) just because it's close enough to Mandarin and I've gotten used to some (thought very few) of the more significant changes that would be lost otherwise e.g. "shenme" -> "dia".

    How much mutual intelligibility does there have to be for languages/dialects to be considered mutually intelligible by mainstream linguists? Does it need to be effortless? If that's the case I'm sure there are a handful of people on the British Isles that would not qualify as mutually intelligible when compared to my Great Lakes English.

  5. Kerim Friedman said,

    March 6, 2009 @ 9:45 pm

    One of the problems with "mutual intelligibility" is that it is not always "mutual." Specifically, speakers of low-status languages often understand the standard but not vice-versa. AAVE is an example of this. Also, what counts as "intelligible" may be specific to particular language environments. In areas where people are forced to live together they might actually be able to understand the other language without actually becoming bilingual. I've heard of this happening among Minnan and Hakka speakers in Singapore, but not so much in Taiwan.

  6. HP said,

    March 6, 2009 @ 9:56 pm

    As someone who speaks no Chinese languages at all, I've always been fascinated by the language vs. dialect distinction in spoken Chinese languages. Through my work, I've met multilingual Chinese who will insist that there is such a thing as spoken standard Chinese which everyone understands. (One day I brought this up at lunch, and a colleague from Beijing and a colleague from Guangdong held a brief conversation, presumably for my benefit, to prove this point.)

    In another world, I am a big fan of exploitation and horror films. I was watching a Cat. III Hong Kong film (I cannot remember the title, and in any case I wouldn't recommend it to anyone not already familiar with Cat. III Hong Kong films), and the main character, after committing a heinous murder, escapes to South Africa, where the only work he can find is in a Chinese restaurant run by Mandarin speakers. Much of the "comedy" which follows centers on his inability to understand his employers, and how they fall back on awkward Chinglish as a common tongue in S.A., and how even this communication between the Hong Kong character and his Mandarin-speaking employers falls apart as soon as an Afrikaans, Zulu, or Xhosa speaker is in the same scene.

    Linguistically, it's a really interesting movie, and I now regret that I can't remember the title in any of the the languages in which the film was released, and I no longer have a copy.

  7. Rubrick said,

    March 6, 2009 @ 10:00 pm

    I think that in order to classify languages on the basis of mutual intelligibility, you'd first need to establish what elements of a local tongue constitute the "language". In particular, if two groups share the same (written) vocabulary and structural rules of grammar, but are mutually unintelligible because of different pronunciation/prosody, are they speaking different languages? For example, I (an Ohio-raised American) find conversation between two native Jamaicans utterly impenetrable. While there are certainly some unfamiliar words and idioms, the main problem is that the cadences are so different from what I'm used to that I simply can't parse the phonetic stream. I do better understanding Spanish, even though I don't speak Spanish. And yet I have no trouble classifying Spanish as a language but Caribbean English as a dialect.

    I guess my point is that mutual unintelligiblity of the spoken language may be necessary but clearly isn't sufficient.

  8. Mark F. said,

    March 6, 2009 @ 10:18 pm

    My understanding is that there are languages that are considered to be different languages but that are mutually intelligible. I think there is a pair of Scandinavian languages that are this way. My source for this is a book by John McWhorter (I think), so he might have something to add. But, anyway, the language-dialect distinction does sound like it's pretty murky.

  9. HP said,

    March 6, 2009 @ 10:45 pm

    Well, you know, for most of us commenting in English here at Language Log, our remarks regarding language versus dialect are marked in ways we may not be aware of. Most native English speakers can forge some path, however difficult, through spoken Scots. Throw East Frisian at an unprepared English speaker, however, and all bets are off (butter, bread, and green cheese notwithstanding).

    Still, I wouldn't want to be the person who has to grade the degree of difference between Scots, English, and East Frisian.

  10. kellen said,

    March 6, 2009 @ 11:24 pm

    I guess for my own definition it would only count as mutual intelligibility if two native speakers from the two languages/dialects in question could, with no experience in the other language prior to this, meet and have a conversation. I've never studied Spanish but I can understand quite a lot just having grown up in a town with a large Spanish speaking population. A Wu speaker being able to understand Mandarin wouldn't really count in my book since the Wu speaker was likely educated in Mandarin. It's not intelligibility. It's their second language. This is what I believe happened in HP's example as well as the case with AAVE in Kerim's example.

    The other thing is that I think it would need to be bidirectional. My friend from Rio has never studied Spanish but can understand it fairly easily. My friend from Nicaragua has absolutely no idea what the friend from Rio is saying when she speaks Portuguese.

    I think if any two languages/dialects didn't satisfy these points then I wouldn't call them mutually intelligible.

    @Sky, I'd have to agree it's not such a valid saying since, to me anyway, it implies that a nation behind it is a necessary condition for language status and a fair number of languages are widely accepted as languages while lacking any political backing.

  11. J.J. E. said,

    March 6, 2009 @ 11:30 pm

    These "difficulties" in classifying languages seems to stem from the urge to impose discrete classifications on phenomena that aren't necessarily discrete. If we are going to try to impose discrete classifications onto languages at all, I think mutual intelligibility is about the best single, simple criterion one could find. The more complex a proposed set of criteria become, the less obviously discrete the classification becomes. From a practical perspective (writing government documents, teaching literacy, etc), it seems very useful to have discrete classifications. But from a theoretical perspective, I'm not sure I see that there is much value added if one insists on imposing a discrete taxonomy, especially in cases where the discrete classification is most prone to violating the assumptions required to impose such classification.

    And finally, I think I'm missing something big in this post. Is classifying Southern Min as a different language than Cantonese any more controversial than classifying Spanish and Italian as different languages? (Other than politically, of course.) What evidence (historical, comparative, etc) would suggest that grouping Sinitic "dialects" into one language is any more reasonable grouping any other group of genetically related but mutually unintelligible group of languages? Are Romance languages any different in this regard? As a bad speaker of Spanish with no Italian training, I can understand quite a bit of Italian. As a slightly better speaker of Mandarin with 2.5 years of living in Taiwan and hearing Taiwanese (Southern Min) and Mandarin comingling every day, I still can't speak or understand a lick of Taiwanese.

  12. Other Dylan said,

    March 7, 2009 @ 12:16 am

    "One thing is certain: a monolingual speaker of Cantonese cannot understand a monolingual speaker of Mandarin and vice versa. There is zero intelligibility between the two languages."

    I don't really understand this. When I go to eat at Double Greeting Wonton I can figure out what the table next to me is ordering when they tell the waitress they want, ngàu nām (牛腩) or wúntun (雲吞) (bad transliterations). I can speak Mandarin but I've never studied Cantonese or been to Guangdong or HK, or watched a lot of Cantonese TV dramas or whatever. But in that situation and, say, when watching a news broadcast in really standard Cantonese, which seems to have a grammar that more closely follows Mandarin, I can sort of get it, eh. I'm not really that clear on anything, but I can pick and pull some meaning out of it.

    So, what does "zero intelligibility" mean?

  13. dr pepper said,

    March 7, 2009 @ 12:24 am

    When you say "scots", you should clarify. Yeah, i could pick my way through Edinburgh speech, but i'm not so sure about Orkney.

    As for Jamaica, i once saw a movie set there. It had subtitles to help with the accent but i still had trouble with the idiom. What the heck is "I and I", i found 3 different explanations on the web.

  14. Lane said,

    March 7, 2009 @ 1:24 am

    @dr pepper: "I and I" is a rasta thing. According to WIkipedia,

    Rastas say that Jah, in the form of the Holy Spirit, lives within the human, and for this reason they often refer to themselves as "I and I".

  15. Bob Ladd said,

    March 7, 2009 @ 2:30 am

    As a long-time American resident in Scotland, I can confirm some points from the thread so far. Dr. Pepper doesn't need to go to Orkney to find a variety of English he can't understand, just Glasgow – contemporary working-class urban Scots is unlikely to be intelligible to any uninitiated American (see this clip or anything else featuring Rab C. Nesbitt if you don't believe me). But most people who speak like Rab can usually understand American English, partly because of the relative status of the varieties (as Kerim Friedman mentions above) and partly just (as several commenters have mentioned) because of Scottish speakers' exposure to American TV, etc. "Mutual intelligibility" is definitely not mutual, and definitely a matter of degree, which makes it hard to know how to use it in classifying languages and dialects.

  16. Nathan Myers said,

    March 7, 2009 @ 3:13 am

    As a youngster in Hawaii I found Fawlty Towers entirely impenetrable. I was pleased, more recently, at not needing subtitles for Lock, Stock, and Two Smoking Barrels.

    "Mutually intelligible" seems among the least translatable expressions, as each linguist amounts to his or her own private speech community.

  17. David said,

    March 7, 2009 @ 4:33 am

    You just have to invent some newspeak for "dialect that really is a language but we can't say so because we'll offend the paranoid nationalists who run China and its minions of brainwashed fenqing".

  18. peter said,

    March 7, 2009 @ 5:42 am

    As a youngster in Hawaii I found Fawlty Towers entirely impenetrable. I was pleased, more recently, at not needing subtitles for Lock, Stock, and Two Smoking Barrels.

    A fluent English-speaking Mexican I know who had studied in Liverpool told me that the only scenes of this movie that he understood immediately were those involving the two Scouse characters.

  19. aaron said,

    March 7, 2009 @ 5:43 am

    I have a similar experience to Other Dylan, but I am a native English speaker who understands Cantonese somewhat and has very (VERY) little Mandarin. I feel I pick up more meaning from spoken Mandarin than a few of my friends who are native in Cantonese. Of course it could be that they don't even try. But I'd have to concur that there is zero intelligibility between M and C in the hypothetical situation of a speaker of one listening in on a normal conversation between two natives of the other.

    I do think that in certain langalects mutual intelligibility depends greatly on how hard both people are trying. In college I lived in Plymouth, Devon, with a group of folks from my university (Lander) in upstate South Carolina. We never had a problem getting around, but we seldom had a problem keeping people (especially older people) from understanding us if we so desired. I have also seen an Italian (student, I assume) help a Mexican at the post office here in Colorado. It was fascinating watching both try very hard at different variations of the same phrase until something worked.

  20. aaron said,

    March 7, 2009 @ 5:48 am

    @ Lane and dr pepper:

    Lane left out the key part of "I&I": to rastafarians, god incarnate is/was the Emperor Haile Selassie I of Ethiopia, referred to as "selassie eye" or "eye". Thus, "I&I"="god is with me". Overstand now? :)

  21. peter said,

    March 7, 2009 @ 5:58 am

    Is the Sinitic taxonomic problem you describe in the post another instance of western intellectual colonialism in taxonomy? In botany, we have trees familiar to western botanists in the 18th Century which are classified very finely, such as the genus Malus (Apples) and the separate genus Pyrus (Pears), each with about 30-35 species. Yet trees which were not as familiar to western botanists in the 18th Century are lumped together into broad groups, such as the very diverse genus Eucalyptus, which has over 700 species (and still growing, as new species are discovered). If Australian Aboriginal scientists had been doing the classification of Australian trees, and not Europeans, Eucalypts would certainly have been placed higher up in the hierarchy, a Family, say, or an Order, to allow for their great diversity. It would seem something similar has happened here, in language classification.

  22. hsknotes said,

    March 7, 2009 @ 6:35 am

    David, Mair invented the 'newspeak' 20 years ago: 'topolect.'

    Peter, the Sinitic taxonomic problem (at least in the area known as greater china is not an 'instance of western intellectual colonialism in taxonomy', it is a political issue created by the chinese govt. Any attempt to try to 'analyze' all things china which may result in some sort of differentiation: DNA, dialects, culture, etc, is a political issue because the government of china sees it that way.

  23. Jerome Chiu said,

    March 7, 2009 @ 9:32 am

    Other Dylan said,
    {quote} [...] When I go to eat at Double Greeting Wonton I can figure out what the table next to me is ordering when they tell the waitress they want, ngàu nām (牛腩) or wúntun (雲吞) (bad transliterations). I can speak Mandarin but I've never studied Cantonese or been to Guangdong or HK, or watched a lot of Cantonese TV dramas or whatever. But in that situation and, say, when watching a news broadcast in really standard Cantonese, which seems to have a grammar that more closely follows Mandarin,…. {end quote}

    The first example is about comprehending proper names, in which case one is able to pick up even loan words from a completely different language, e.g. rajio (radio), kohhi (coffee), biru (beer) in Japanese.

    The second example is more interesting. News anchors use a more "formal" kind of Cantonese than is used colloquially, and the reason why this "formal" Cantonese resembles Mandarin (in syntax, etc.) more can be explained by the way in which Chinese has for decades been taught in Hong Kong. We speak Cantonese. We write in Standard Mandarin but whisper, as it were, what we write to ourselves in Cantonese; we read aloud set texts written in Modern Chinese (i.e. modern written Mandarin) in Cantonese; we learn all the classic aphorisms, allusions, quotes, [i]chengyu[/i], etc. (most if not all originally in Classical Chinese) in Cantonese. It is inevitable, therefore, that in a formal setting, the kind of Cantonese used by e.g. a news anchor is influenced to a significant extent by Mandarin.

    It is at the colloquial level where "mutual intelligibility" (or lack of it) really comes into play. Compare xiexie 謝謝 with m-goi 唔該 and b'ong xie 甭謝 with m-sai m-goi 唔駛唔該 and one easily gets the picture.

    The complicating factor is that there has always been another language that is universally used and recognized [i]in parallel[/i]: Classical Chinese in traditional China and Modern Standard Chinese in modern China. Each of these two interplays with the spoken languages intensely, while (and more so in recent times) each spoken language interplays with each other quite a lot as well.

  24. Stephen Jones said,

    March 7, 2009 @ 9:55 am

    The 'language is a dialect with an army and navy' is anything but nonsensical in a European context.

    The difference between Dutch and German is probably not much greater than the differences between different variants of Low German. Moreover the state uses its control over education to enforce a standardization that reinforces the differences.

    In France there was the concept of 'patois' which downgraded other languages to dialect status.

    In order to survive fully a language needs the backup of a state or equivalent to ensure that it can be used for all purposes. Compare Catalan and Occitanian.

  25. language hat said,

    March 7, 2009 @ 10:41 am

    people say such nonsensical things as that "A language is a dialect with an army and a navy."

    It's not nonsensical at all; it's one of the most sensical things ever said about language. The point (which I would have thought was obvious) is not that the presence of an army and a navy provide a linguistic marker but that they (standing in for a government with effective means of enforcing its will) impel the use of terminology that is incoherent from a linguistic point of view, like the classification of what are clearly different languages in China as "dialects."

  26. Mark Liberman said,

    March 7, 2009 @ 11:53 am

    language hat:: "…people say such nonsensical things as that 'A language is a dialect with an army and a navy.'" It's not nonsensical at all; it's one of the most sensical things ever said about language.

    I agree that Max Weinreich's dictum is sensible and even profound. I suppose that Victor must have been reacting against a literal interpretation, which would mean that there are no linguistic questions here at all, but only political ones.

    However, I'd add to Steve's remarks the observation that there are intellectually respectable reasons to consider classification of linguistic varieties to have a political or social dimension as well as a strictly linguistic one.

  27. Mark F. said,

    March 7, 2009 @ 2:47 pm

    Well, I listened to the Rab C. Nesbitt clip, and I couldn't understand him, but I still had the sense that he was talking English with a very thick accent (more accurately, a very different accent than mine). There was almost the sense of things coming in and out of focus, as when he asks "could you see your way clear to lending me…" Do Mandarin speakers listening to Cantonese have that impression, or are there too few points of contact?

  28. Arnold Lambtally said,

    March 7, 2009 @ 3:18 pm

    HP said,

    "In another world, I am a big fan of exploitation and horror films. I was watching a Cat. III Hong Kong film (I cannot remember the title, and in any case I wouldn't recommend it to anyone not already familiar with Cat. III Hong Kong films), and the main character, after committing a heinous murder, escapes to South Africa, where the only work he can find is in a Chinese restaurant run by Mandarin speakers. Much of the "comedy" which follows centers on his inability to understand his employers, and how they fall back on awkward Chinglish as a common tongue in S.A., and how even this communication between the Hong Kong character and his Mandarin-speaking employers falls apart as soon as an Afrikaans, Zulu, or Xhosa speaker is in the same scene.

    Linguistically, it's a really interesting movie, and I now regret that I can't remember the title in any of the the languages in which the film was released, and I no longer have a copy."

    Is the film 'Ebola Syndrome' (Yi bo la beng duk, 1996)?

  29. Ellen K. said,

    March 7, 2009 @ 4:05 pm

    I wonder if the perception of Chinese as a single written language has influence here. Seems to me that if people have (or are seen to have) a common written language, then the spoken forms are probably more likely to be seen as dialects, rather than separate languages.

    I'm also thinking about the idea (which has been brought up) for how accent comes into play. When it comes to mutual intelligibility, how important is accent versus grammar and vocabulary, as far as marking langauge boundaries? (I've no clue at all how much that's a factor in Chinese languages and dialects.)

  30. LoveEncounterFlow said,

    March 7, 2009 @ 4:50 pm

    i would like to throw in that even in botanics, classical discrete taxonomy is seemingly not without its critics even from inside the scientific community. to witness: "… I was much struck how entirely vague and arbitrary is the distinction between species and varieties" Darwin 1859; "No term is more difficult to define than "species," and on no point are zoologists more divided than as to what should be understood by this word". Nicholson (1872); "The species problem is the long-standing failure of biologists to agree on how we should identify species and how we should define the word 'species'." Hey (2001). the list goes on.

    as for the tree model employed for the purposes to classify languages and dialects, there is a by-now much-discarded model, the so-called wave theory (http://en.wikipedia.org/wiki/Wave_model_(linguistics) ). i do not know whether the wave model can be successfully put to work, but i am very sure that the the strict branching-bough model does not apply in linguistics except by historical accident—when two groups of people separate and evolve their disparate ways of speaking, and never meet again. for any two given languages, when put into contact, will start to mutually influence each other.

    most things in the history of languages are simply not reachable for us, being outcomes of events that took place tens of thousands of years ago. the young grammarians were inclined to claim a ‘proper’, ‘pure’, ‘classical’ state of language and disparage anything that they found to be traceable as ‘downfall’ and ‘decline’. fact is that languages are ever-changing phenomena. fact is, there are phenomena like language crossroads (http://en.wikipedia.org/wiki/Sprachbund) and areal features (http://en.wikipedia.org/wiki/Areal_feature_(linguistics) ). but some phenomena in the speech of members of linguistic community X just *must* be disparaged as being ‘merely the result of sustained contact to members of linguistic community Y’, to prevent that beautiful and mangeable tree-model turn into unfathomable morassy molasses.

    don’t get me wrong here, for me as much as for almost anyone else japanese is ‘a different language from german’, while (even) (most of) bavarian spoken today is (just) ‘a dialect of german’. however, in full view of the fact that discrete taxonomy works reasonably well in zoology (because once separated, populations may become infertile to each other), and that there are (apparently) strong links between human genome distribution and linguistic relationships, one should still not give in and insist on using perhaps inappropriate or insufficient terms and concepts when dealing with language.

    i could imagine that the concepts of ‘language’ and ‘dialect’ are simply insufficient when you try to describe the linguistic reality of a community of one billion people that have a historical record spanning over three millennia, who have developed *both* a common written language *and* the tendency to build mutually distinct micro-linguistic groups (à la ‘one dialect per hamlet’). speaking of languages and dialects in the classical sense requires you to identify homogenous regions of speech, label accumulated speech samples as either ‘dialect’ or ‘language’, and possibly describe how groups of those descended from a unique ancestor. this may prove impossible to do *even* if one were in full possession of all pertinent data—simply because people have been communicating across the boundaries of their villages and towns for millennia and millennia.

    i would like to recommend following quick readings (i do not necessarily endorse all that is said in wikipedia, but it’s very accessible and at least these articles give a sketch of the outlines of some problems):

    http://en.wikipedia.org/wiki/Ring_species#Larus_gulls (even in zoology, the concept of a ‘species’—a ‘language’ for us word-watchers—can be hairy: gulls mate with each other in groups around the north pole, but not in europe, where the two ends of an open ring of interconnected habits meet. in europe, those two ends look like / ‘are’ two species).

    http://de.wikipedia.org/wiki/Familienähnlichkeit (brought up by wittgenstein and tatarkiewicz, the concept of family resemblance maintains that their are some things that defy rigid class-like classifications; ‘games’ are one example in point (think of chess vs rugby vs yoyo)).

    http://en.wikipedia.org/wiki/Rhotic_consonant (family resemblances put to work: ‘‘each member of the class of rhotics [r-like sounds] shares certain properties with other members of the class, but not necessarily the same properties with all; [...] rhotics have a "family resemblance" with each other rather than a strict set of shared properties.’’ — i guess much the same can be said about dialects anywhere in the world).

    http://en.wikipedia.org/wiki/Dialect_continuum (the phenomenon that people from nearby—who understand the people in the next town who understand the people in the City—can be understood whereas i myself cannot understand the people in the City. again a case of family resemblances).

  31. Dan Lufkin said,

    March 7, 2009 @ 4:54 pm

    It's interesting to compare the Chinese situation with the North Germanic family of languages, dialects and slangs. Fifty years ago (before TV) if you traveled leisurely (as I once did) from Hamburg to Dunkerque you passed through a continuum of Low German, Frisian (East & West), Dutch and Flemish and rural people could always understand folks from the two neighboring villages although the newspapers and radio stations provided major anchors for the national languages (the ones with an army & navy).

    Farther north in Scandinavia, the situation is similar, but the languages are more split up by water and mountains. In Denmark, Norway and Sweden, people can usually read all three languages. (Four, really, if you count Bokmål and Nynorsk as two language, as the Norwegian government does.)
    Swedes and Danes understand spoken Bokmål but have trouble with Nynorsk. Norwegians (who study both B and N in school) understand basic Swedish although it sounds strange because it has more undomesticated German words. Nobody understands modern spoken Danish but Swedes and Norwegians can understand old Danish movies.

    Faeroese and Icelandic are something else again.

  32. Ellen K. said,

    March 7, 2009 @ 5:21 pm

    In short, A and B might be mutually intelligible, and B and C mutually intelligible, but not A and C. Is that a short version of what you are getting at, Dan?

  33. Jonathan said,

    March 7, 2009 @ 6:41 pm

    I wondered about Valenciá (or Valencian) in relation to Catalan and learned a new term: "ausbau languages." These are mutually intelligible dialects that have separately definable norms.

  34. Bertilo Wennergren said,

    March 8, 2009 @ 4:16 am

    It's funny how some people desperately want to treat the language-dialect distinction as somehow linguistic, although it obviously is and always has been political-sociological-cultural. If you just drop the whole idea that linguistic criteria should be used, most of the problems vanish.

  35. Michael Tinkler said,

    March 8, 2009 @ 5:10 am

    "…people say such nonsensical things as that 'A language is a dialect with an army and a navy.'"

    A language is a dialect with a Department of Education and firm grasp of the curriculum.

  36. David said,

    March 8, 2009 @ 9:43 am

    Dan: I don't agree that Swedes don't understand any modern Danish. I often converse with Danes in "Scandinavian" (i.e. I speak Swedish clearly and slowly and avoid words/expressions that I know only exist in Swedish, and they do the same, except of course they speak Danish).

  37. Bob Ladd said,

    March 8, 2009 @ 10:51 am

    @Michael Tinkler: Great line about the Department of Education. I definitely plan to use it next time I lecture about the development of standard languages in Western Europe. Thank you.

    @Bertilo Wennergren: Sorry, but it obviously is partly a linguistic question. Gaelic and English are different languages, even though in many ways Gaelic fills the same political-sociological-cultural role in Scotland that "dialect" would fill in, say, Italy. Nobody argues about that, even though there are plenty of people who think bilingual education and efforts to prevent Gaelic from disappearing are a waste of time and money, and even though people regularly used to describe Gaelic in terms that are familiar from the things people say about e.g. "Ebonics". Same applies to Spanish and Basque, French and Breton (or French and Alsatian), Italian and Albanian, and many more. Since are lots of such cases where everyone agrees that we're dealing with two different languages, the question inevitably arises: where do you draw the line between describing two different ways of speaking as different languages and describing them as variants of the same language? Once you have approximately located that line, then – and only then – do all the political and social and cultural factors come into play. But the approximate location of the line is based on linguistic criteria, and therefore it's legitimate to bring linguistic considerations into play in addressing the language-or-dialect question about Mandarin and Cantonese, Spanish and Catalan, Czech and Slovak, and so on.

  38. Dan Lufkin said,

    March 8, 2009 @ 12:39 pm

    Ellen, as is so often the case where language is involved, alas, there is no short version of what I'm getting at. These situations are what mathematicians call non-transitive, like when you prefer blueberry pie to cherry pie and cherry pie to apple pie but apple pie to blueberry pie.

    David, well, when I say modern Danish I mean Copenhagen Danish. Sure, Swedes understand Danes when the latter make an effort but I challenge a Swede to hang around Cristianshavn and carry on a normal conversation.

    True, there exists a Samnordisk that most educated Scandinavians can produce on demand (having studied "neighbor-languages" grannspråk in school) — Swedes avoid German words, Norwegians avoid feminine gender, Danes avoid stødet (glottal-stop phonemes). This kind of Danish even has its own linguistic label: Gøtudansk (street-Danish). Googles well, even in English.

  39. Stephen Jones said,

    March 8, 2009 @ 2:07 pm

    I wondered about Valenciá (or Valencian) in relation to Catalan and learned a new term: "ausbau languages." These are mutually intelligible dialects that have separately definable norms

    Valencian is a dialect of the language commonly known as Catalan. Those who claim it is a separate language ('blaveros') are making a political statement of little linguistic validity.

    The big distinction in Catalan is between Eastern Catalan (spoken in the Balearics and provinces of Tarragona, Barcelona and Gerona, and Western Catalan spoken in Lleida and Valencia. When Valencia was reconquered many of the mozarabe-speaking inhabitants left and the country was populated by immigrants from the mountainous and infertile parts of Western Catalonia.

    It was common in the sixteenth century for Catalan to be called Valencian reflecting the superior economic and cultural milieu of the area. However you can take texts written in 'Valencian' and texts written in 'Catalan' and only an expert, if that, is able to distinguish between them. The American translator of 'Tirant lo Blanc' used to go around with texts from both dialects and give them to those who claimed they were different languages and ask them to distinguish between them. The normal reply he got from the 'blaveros' was to have eggs and tomatoes thrown at him.

  40. Stephen Jones said,

    March 8, 2009 @ 2:18 pm

    Bob Ladd, the difference between dialect and language obviously depend on where you define language.

    A dialect has to be a dialect of something. It is perfectly reasonable to say that Spanish and Catalan are both dialects of Romance, but few would say that Romance is a language as opposed to a sub-family, even thought the difference between it's various members is probably less than those between the variants of Chinese Victor is talking about.

    Another interesting matter is how far the existence of official institutions can cause a decline in intelligibility. Galician is a dialect of Portuguese, and it is doubtful is further away from the Portuguese spoken in Lisbon than is the Portuguese of Brazil. Yet the existence of the Galician Academy will result in a standard version that is different from that of Portuguese, just as an Academy of the Scots Language would probably lead to Gordon Brown being even less intelligible to English speakers than he is now.

  41. Robert said,

    March 8, 2009 @ 3:01 pm

    Is the problem that mutual intelligibility isn't an equivalence relation?

    [(myl) The relation "__ is intelligible to __" isn't symmetric, as has been pointed out in these comments a number of times, nor is it transitive. It's also gradient (you can understand to varying degrees, and your understanding can be variably noise-resistant). And finally, the elements being related are themselves not uniform. When I was in Glasgow last year, I found some of the locals perfectly intelligible, while others might as well have been speaking Tibetan. In some cases, I had both experiences at different times with the same person. And as usual, I found that after a few days, experience with the sound system allowed me to understand more and more of what I heard.

    As far as I know, there isn't even a coherent methodology for determining how intelligible one person is to another in a given context, much less for generalizing this concept to classes of people across contexts. So we start with the core cases -- at one end of the scale, two people who have grown up in the same speech community; at the other end, two people speaking completely unrelated languages, each unknown to the other. And then people argue about how to treat the many steps in between, where a varying percentage of things are understood to a varying degree a varying proportion of the time.

    In my opinion, the notion of intelligibility (mutual or otherwise) is not a very helpful one for linguistic classification. This is partly because it's so badly defined, and partly because it's an effect of so many different causes -- lexical replacement, morphological variation, sound change, grammatical differences, social attitudes and experiences, shared conceptual frameworks, and so on. ]

  42. joseph palmer said,

    March 8, 2009 @ 10:28 pm

    It is usually best not to make the assertions too strong when dealing with this topic. There are many examples of this fault above, but the original article claiming that (a rare attempt at) a written version of a Chinese "dialect" being as hard for a speaker of another dialect to understand as the spoken version is perhaps the largest exaggeration.

  43. hsknotes said,

    March 9, 2009 @ 4:40 am

    Dear Joseph,

    As for the claim that the written version of a dialect is as hard to understand as the oral version, I think it's best to be be clear on a few things. One, how far apart in terms of sound are the two dialects/languages/topolects, etc. Two, what sort of writing system are we talking about? With pure romanization, and I think that is what the post is getting at, you are pretty much lost if the dialects are far apart. This is what is meant when he says 'unadulterated':

    Example, written Taiwanese, from wikipedia:

    Siensvy korng, hagsefng tiaxmtiam thviaf. – Any mandarin speakers care to guess?

    But, even with the language being written with at least a certain amount of 'characters', you're very easily still very very lost.

  44. Merri said,

    March 9, 2009 @ 12:49 pm

    The problem with defining dialects of the same language on the basis of mutual intelligibility is that this relationship isn't transitive.

    By trying to apply it repeatedly, you'd end up stating that :
    - Austrian and Bavarian are dialects of a same language
    - Bavarian and Hessian are
    - Hessian and Franconian are
    - Franconian and Limburgian are
    - whence Limburgian and Austrian must be dialects of a same language too.

    Now try speaking Limburgian dialect in Wien ;-)

    Only fuzzy logics can overcome the hurdle, and that meansthe notion of "dialects of the same language" should be replaced by that of "neighbour dialects" or the like.

  45. Jim said,

    March 9, 2009 @ 1:13 pm

    Something else to consider is the prestige of written languge in China over spoken language. People are quite willing to admit that they can't understand people from XY, but insist that they all use the same [written] language. (Even that isn't exactly true, so they may really be thinking of Classical Chinese when they say that.) They can get really ,ummmmmm…forceful in insisting that China has one language, one people, etc. if you press the issue.

  46. Cameron said,

    March 9, 2009 @ 1:21 pm

    Dan Lufkin wrote: "David, well, when I say modern Danish I mean Copenhagen Danish. Sure, Swedes understand Danes when the latter make an effort but I challenge a Swede to hang around Cristianshavn and carry on a normal conversation."

    Well, there's Swedes and then there's Swedes. Someone from Stockholm might resort to English to have a conversation in Copenhagen, but I suspect someone from Lund or Malmö wouldn't have a problem.

  47. Stephen Jones said,

    March 9, 2009 @ 4:12 pm

    I think you are exaggerating the difficulties with intelligibility.

    After some time in Glasgow you would find yourself understanding most of what was said to you in Scots. You could spend a lifetime in Wales without understanding what people were saying to you in Welsh.

    [(myl) If this is in reference to my earlier comment, then it's a partial restatement of the point that I intended to make. Sorry for the unclarity.]

  48. David Marjanović said,

    March 9, 2009 @ 8:50 pm

    Throw East Frisian at an unprepared English speaker, however, and all bets are off (butter, bread, and green cheese notwithstanding).

    What is spoken in Ostfriesland is not Frisian, it's Low German (…which is more closely related to Frisian + English than High German is, but still).

    Nobody understands modern spoken Danish

    You know, there are jokes about that being literally true. Several Norwegian and/or Swedish sketches, available on teh intart00bz, pretend that the Danes themselves don't understand each other. [(myl) For example this one.]

    Siensvy korng, hagsefng tiaxmtiam thviaf. – Any mandarin speakers care to guess?

    Keep in mind that several of the "consonants" here just spell out the tones. Though it's still different enough.

    ———————————-

    The Sinitic-speaking Chinese traditionally consider themselves to be a single people, in Mandarin Hàn. It almost logically follows that they consider themselves to speak a single language, Hànyǔ…

    ———————————-

    Yes, "the species problem" is very, very similar to the question of how to define "a language". There are at least 25 different "species concepts" out there, and they have basically nothing in common except the word "species"; depending on the species concept, there are between 101 and 249 endemic bird species in Mexico, I kid you not. Also, wolf, red wolf, and coyote are 1, 2, or 3 species depending on the definition…

    The higher ranks (like genus) are not defined at all, and most (or all) are now gradually being abandoned for being meaningless and sometimes even misleading. It's quite a relief that historical linguists have almost never tried to use terms like "superfamily", "family", "branch", "stock" and the like in any systematic way.

  49. Therese said,

    March 10, 2009 @ 8:26 am

    One of my great joys, as a Mandarin speaker and Cantonese learner, is to read Apple Daily for the random articles written in colloquial Cantonese. Trying ti figure out both sound and meaning is great fun — even if some take me well over an hour!

  50. Ken Brown said,

    March 10, 2009 @ 10:19 am

    Is this situation unique to China?

    Doesn't something similar apply to Arabic within some larger nation states as well as between regions? There is apparently one written language (or maybe more accurately two-an-a-half similar written languages) covering a range of mutually unintelligible local forms of speech, but political and religious authorities treat them all as one language.

    Maybe that's not quite the same as local or low-status dialects of, say, English or German because individual speakers of those dialects move freely along the continuum between the dialect and the standard?

    I mean, a Glaswegian or a Jamaican is not only likely to be exposed to standard English enough to understand it, they are likely to be able to produce it when needed. In fact there might be a whole range of "standards" – a Glaswegian might speak a very distinctive local accent to friends and neighbours, a sort of generalised "urban West of Scotland" to others, and Standard English in a Scots accent to foreigners (American TV-watchers might think of Rab C Nesbitt in the first category, Billy Connolly in the second, and Gordon Brown in the third). Similarly a Jamaican might code-switch quite freely between their local dialect and a kind of standard Jamaican English and (in some contexts such as higher education) something like a West Indian version of RP. (A Jamaican in London could probably add at least another two accents to that)

    That is not I think perceived by the speaker as switching between distinct languages but between styles of one language. And it is a continuum, you aren't always clearly speaking one or the other, and people move around in that space for stylistic effect.

    Am I right to think that the difference between Cantonese and Mandarin is not like that? That all speakers will be aware of the distinction between the two, and that they are not usually mixed? Or that if they are then it stands out clearly, like someone speaking Spanish and English together. A few days ago I overheard a Colombian, a Brazilian and a Spanish person in a bar in London talking in what I assume was Spanish. But they frequently used English words and occasionally entire English sentences. I assume that they would always have been able to clearly identify an English phrase as English and a Spanish as Spanish. Someone moving along a continuum between two dialects of one language probably doesn't do that?

    HP said: "… I was watching a Cat. III Hong Kong film [...] and the main character, after committing a heinous murder, escapes to South Africa, where the only work he can find is in a Chinese restaurant run by Mandarin speakers. Much of the "comedy" which follows centers on his inability to understand his employers, and how they fall back on awkward Chinglish as a common tongue in S.A. …"

    I was once at a Chinese restaurant in London with my then boss, a Hakka speaker. He was pleased to notice that the waiters were talking his mother tongue, and started to order in Hakka. After a while they asked him to speak English apparently because they couldn't understand him.

  51. 28481k said,

    March 10, 2009 @ 12:28 pm

    The Sinitic-speaking Chinese traditionally consider themselves to be a single people, in Mandarin Hàn. It almost logically follows that they consider themselves to speak a single language, Hànyǔ…

    Hang on there, Hàn as an identity actually only came after the Mongols conquered China, Sinitic people had a less strong ethnic identity before then. What they realised was a (sort of) common culture, but not a distinct ethnicity. Cantonese, for example, talk about 唐人 (tángrén), because it was when, mythically, Cantonese ancestors moved towards Guangdong (many clans recognised that they arrived at around Song). Hence Chinatown being known as 唐人街 (tángrén jiē) in Cantonese.

    That is not I think perceived by the speaker as switching between distinct languages but between styles of one language. And it is a continuum, you aren't always clearly speaking one or the other, and people move around in that space for stylistic effect.

    Am I right to think that the difference between Cantonese and Mandarin is not like that? That all speakers will be aware of the distinction between the two, and that they are not usually mixed? Or that if they are then it stands out clearly, like someone speaking Spanish and English together.

    They are different enough to stand out clearly one another, though of course once you get hold of the basics of the two languages you can use phonemic equivalence to figure out the pronunciations. So they are generally not mixed (though code-switching does occur, it's generally realised as code-switching: people know that you're using two different fangyan), and it is very difficult to fake Mandarin (as I once tried when I was a wee boy) when you only know Cantonese, or vice versa. Unlike closer dialects among Mandarin, or even between Shanghainese and Mandarin, Cantonese are very guard of their language. The success of mass media and post-war migration to Hong Kong created a Cantonese-wide standard of what I called Metropolitan Cantonese, before that if you speak the standard dialect in places like Siyi, you're probably laughed at! Toishanese, for example, have sounds like /ɬ/ in place of /s/!

    I was once at a Chinese restaurant in London with my then boss, a Hakka speaker. He was pleased to notice that the waiters were talking his mother tongue, and started to order in Hakka. After a while they asked him to speak English apparently because they couldn't understand him.

    Different dialects and accents of Hakka tend to obscure intelligibility, and if one isn't used to expose one's mother tongue to outsiders, the fact that they speak the "same" language would not help them to understand more than an iota. (Haifeng dialect, for, has one more tone than the prestige Meixian dialect) Perhaps your then boss is used to speak Hakka in a wider audience than the waiters, which made him attuned to their speak pattern and conclude that they're speaker Hakka, whereas it didsn't happen vice versa.

  52. 28481k said,

    March 10, 2009 @ 12:39 pm

    Oops, sorry that I repeated the post as I was too eager to see it posted, could someone delete the first one (which is the same as the second one without the pronunciation of 台山話.

  53. Liz said,

    March 10, 2009 @ 4:49 pm

    Speaking as someone living in an area with both Spanish-speaking and Portuguese speaking immigrants, while Brazilians can understand Spanish without too much difficulty, Spanish speakers have a much harder time with Portuguese. The result is that the Brazilians end up speaking Portugnol which is mishmash of both languages, with a few terms in English for names of things that they learned only in this country.

    Undoubtedly some of this unidirectional intelligibility has to do with the fact that Portuguese has more phonemes than Spanish, making Spanish "easier" than Portuguese to learn. Also, I'm sure there's a class element as well as more of the Brazilians came from the middle class (for Brazil) and have a correspondingly higher level of education than the Spanish-speakers. Although, I wouldn't call education levels to be absolutely necessary as I know at least a few immigrants who have no more than a 3rd grade education who can speak quite serviceable English.

  54. joseph palmer said,

    March 10, 2009 @ 8:52 pm

    I agree with hsknotes that a phonetic written version of a Chinese dialect would be as hard to understand and the spoken, of course. However, it is very unlikely that anyone will attempt that. Chinese people will use Chinese characaters, and the main content words will be pretty similar in all cases.

    (Chinese people are generally rather poor at even the widespread pinyin system for Mandarin)

  55. Victor Mair said,

    March 11, 2009 @ 2:56 pm

    @Jim: "They can get really ,ummmmmm…forceful in insisting that China has one language, one people, etc. if you press the issue."

    I know that from personal experience, so I don't press the issue, but instead invent newspeak circumlocutions like "topolect" to deal with the problem. Anyway, when people get "forceful," I know that they've left linguistics far behind.

    @David Marjanović: "The Sinitic-speaking Chinese traditionally consider themselves to be a single people, in Mandarin Hàn. It almost logically follows that they consider themselves to speak a single language, Hànyǔ…."

    But it doesn't really follow logically after all, no more than that all Indic-speaking Indians speak "Indian" or that all Germanic-speaking Americans speak "American" or that all Romance-speaking Italians speak Italian (don't forget about Sardinian). Furthermore, Cantonese know very well that what they speak is Cantonese (jyut6 jyu5) and Taiwanese know very well that what they speak is Hoklo, etc., and they also know that these languages are very different from Mandarin. Just to show you how different Cantonese is from Modern Standard Mandarin (MSM), here is a typical Cantonese question: 係唔係佢哋嘅?(hai6 m4 hai6 keoi5 dei6 ge3?) In MSM that would be 是不是他們的? (Shì bú shì tāmen de?) Both questions mean "Is it theirs?" There are a total of five different characters in the Cantonese question. Of these, four are not used in MSM and three are marked with mouth radicals as being used to record the sounds of Cantonese morphemes for which there are no standard characters.

    I taught in Hong Kong during 2002-2003 and took Cantonese lessons the whole time, but I must confess that at the end I still understood almost nothing of conversational Cantonese spoken by two natives at normal speed. My wife, who has been a speaker and teacher of Mandarin her whole life, didn't learn a single word of Cantonese the whole time she was in Hong Kong with me. I remember going to banquets with her and listening to the Cantonese chatting among themselves and giving speeches; my wife didn't understand even half a sentence of anything that was said.

    To all of those who commented on my remark about the nonsensicality of the widely quoted quip about armies and navies determining the difference between a dialect and a language, just think of the situation in India. Marathi, Oriya, Gujarati, Bengali, Hindi, Tamil, Telegu, etc. are all universally recognized as different languages, but they do not have armies and navies to back them up. Conversely, Australian English, Canadian English, and American English all have armies and navies to back them up, but I don't know anyone who thinks of them as separate languages. The counterexamples to the army and navy witticism that could be adduced are endless. I much prefer Michael Tinkler's astute line, seconded by Bob Ladd, that "A language is a dialect with a Department of Education and firm grasp of the curriculum." In the final analysis, though, I think that — when the two are talked about together — it is both common sense and linguistic wisdom that languages are considered to be larger, higher order entities than dialects. In other words, a language may consist of several dialects, but a single dialect may not consist of several languages.

  56. Nathan Myers said,

    March 11, 2009 @ 5:34 pm

    Thank you, Victor, for your summing-up. I could wish we had more of such erudite punctuation (commas, possibly, if not full stops) on LL.

  57. Stephen Jones said,

    March 11, 2009 @ 9:14 pm

    Marathi, Oriya, Gujarati, Bengali, Hindi, Tamil, Telegu, etc. are all universally recognized as different languages, but they do not have armies and navies to back them up

    I think you'll find most of them did at a prior stage in History.

  58. Franz Bebop said,

    March 12, 2009 @ 1:11 am

    @Victor: I know that from personal experience, so I don't press the issue, but instead invent newspeak circumlocutions like "topolect" to deal with the problem. Anyway, when people get "forceful," I know that they've left linguistics far behind.

    With all respect, Prof. Mair, I think you should press the issue. They haven't left linguistics behind, they've invaded it and want to impose the rules. You should not back down. The definition language vs. dialect is a question for linguists to decide, not for politicians or for hot-headed nationalists, and that means you yourself are one of the experts. Stand your ground.

    Chinese is composed of many languages, not just dialects. If you agree with this statement, then you should say so. Nobody should be forced to state something other than their true beliefs about the facts just because certain people get upset. Academics should not give into bullying.

    You cannot control whether other people get upset — it's not your problem. But you have absolute and complete control over your own self-censorship. You should state your true opinions and your true understanding of the facts, even to people who don't like hearing it.

    Why would you say "topolect" when you really mean "language"? If you mean language, then say language. Say exactly what you think.

  59. Stephen Jones said,

    March 12, 2009 @ 4:24 am

    The definition language vs. dialect is a question for linguists to decide

    The point is that in many cases linguists can't decide, and when they can do that is only because what is a language has been decided by external factors.

  60. Franz Bebop said,

    March 12, 2009 @ 8:54 am

    @Stephen: The point is that in many cases linguists can't decide, and when they can do that is only because what is a language has been decided by external factors.

    I don't agree. What you say is true, but it's not the central point. Can you tell me which linguists think that "Chinese" is just one single language, rather than a group of related languages? I'm betting it's a small minority of linguists outside of China. It is not an idea championed by linguists, it's an idea championed by Chinese nationalists, who in Prof. Mair's words "leave linguistics behind."

    Prof. Mair's question was, how do we break the impasse? There is no impasse. You already know what to do. Publish exactly what you believe, and if it upsets someone, too bad. Pissing people off is part of the job description of an academic.

  61. Fluxor said,

    March 12, 2009 @ 10:19 am

    Victor Mair writes:

    I taught in Hong Kong during 2002-2003 and took Cantonese lessons the whole time, but I must confess that at the end I still understood almost nothing of conversational Cantonese spoken by two natives at normal speed.

    I've had the very opposite experience learning Cantonese and I didn't even learn it in Hong Kong. I learned it all in Canada and not in a classroom setting. I grew up in Ottawa and in university, befriended many Cantonese speakers. At that time, I already knew a bit of Mandarin and can read many characters. Once I figured out the mapping of sounds from one to the other and the few differences in character usage, it was fairly smooth sailing from there. Watching Cantonese movies/TV with Cantonese-style Chinese subtitles also hammered home the differences. While there are many Cantonese only vocabulary (邊度, 而家, etc.), it's not much of an effort to map 是 to 係, 不 to 唔, or 他 to 佢. I find that the phonetic mapping from Mandarin to Cantonese is more often than not a 1-to-1 relationship.

    On reading Cantonese writing in characters, it's actually not that onerous for a Mandarin speaking reader. Being informed of the following (係=是, 喺=在, 而家=現在, 嘅=的, 呢=這), any reader of MSM can read the following article from the Cantonese section of Wikipedia (I just picked the story at the top of the page). The first paragraph is:

    香港中國婦女會中學(Hong Kong Chinese Women's Club College,英文簡稱HKCWCC),係喺香港東區西灣河,創校於1978年,而家個校長係黃明孝先生。呢間學校係用英文(EMI)作授教語言,係第一組(Band 1)嘅中學,係政府資助嘅中學之一。創校團體係香港中國婦女會,以「博學篤志」為依間學校嘅校訓。香港商業電台出名DJ少爺占(甄子康)亦曾就讀依間學校。

    In fact, if you leave out all of those five Cantonese only terms, the paragraph is still highly intelligible to any person that can read Modern Standard Chinese. Thus, I disagree with the Victor Mair's assertion that:

    [if one] does contrive to write unadulterated Cantonese or Taiwanese, for example, they will be as hard for a reader of Mandarin to understand as spoken Cantonese or Taiwanese is for a speaker of Mandarin to understand. (emphasis mine)

    Hard, yes, but not nearly as hard.

  62. Kevin Iga said,

    March 12, 2009 @ 1:23 pm

    Part of the difficulty in Chinese seeing their own language situation is that the regional spoken language is rarely taught in schools; instead, they usually learn standard Mandarin. Even in Hong Kong, where instruction is given in Cantonese (except for schools where it is given in English), they learn to write Standard Written Chinese which is Mandarin, more or less (with some code-switching to Cantonese in the Hong Kong case). A Cantonese speaker from Hong Kong informed me that "it's easy for someone who speaks Mandarin to learn Cantonese–just speak Mandarin badly." His claim, though clearly wrong (just ask a Mandarin speaker who is trying to learn Cantonese), is based in his experience: when he tried to communicate in Mandarin, one of two things would happen: he would use Cantonese or he would use Mandarin (though probably a mixture was the result). In a classroom setting, he would be corrected away from the Cantonese choice. Thus, he concludes that Cantonese is just incorrect Mandarin. Not unlike many Americans' views of African American English Vernacular.

    About taxonomy: it seems that there are two possible goals here: a historical linguistic study of the phylogeny of the Sinitic languages, versus some measurement of the "distances" between these dialects. The latter would surely involve inventing some measurement of "distance", which would be very difficult.

    It is well-known by comedians that it is possible to make some foreign languages vaguely understandable by careful choices of vocabulary (especially borrowed terms) and intonation. There's the "Alles Lookenspeepers" example which isn't good German, but somewhat makes the point. Even fictitious languages: recall the Swedish Chef on the Muppets. The reverse would be to cherry pick examples that are hard to understand, which happens all the time with slang. But there might be something one could do that relates to more "typical" speech.

  63. Stephen Jones said,

    March 13, 2009 @ 3:30 pm

    Franz Bebop
    What you're saying might be true of Chinese but it is not at all clear for other languages/dialects.

  64. Victor Mair said,

    March 13, 2009 @ 5:19 pm

    @Franz Bebop: I take courage from your firm, encouraging remarks and shall act accordingly.

    @Nathan Myers: I believe in the importance of commas and semi-colons, and thank you for recognizing that.

    @Fluxor: Before I comment at greater length and in finer detail on the seemingly miraculous ease with which you learned Cantonese, in contrast to the rest of us Mandarin-speaking blokes who have struggled hard at it, I need to ask you a few simple questions:

    1. How long have you been studying Mandarin?

    2. How long have you been studying Cantonese?

    3. On a scale of 0-10, from absolutely no ability whatsoever (0) to completely native fluency (10), how would you rate your ability in spoken Cantonese (listening and speaking)?

    4. On a scale of 0-10, from absolutely no ability whatsoever (0) to completely native fluency (10), how would you rate your ability in written Cantonese (reading and writing)?

    5. On a scale of 0-10, from absolutely no ability whatsoever (0) to completely native fluency (10), how would you rate your ability in spoken Mandarin (listening and speaking)?

    6. On a scale of 0-10, from absolutely no ability whatsoever (0) to completely native fluency (10), how would you rate your ability in written Mandarin (reading and writing)?

    Questions 3-4, of course, must be measured according to normal speech speeds for typical native speakers; questions 5-6 should be carried out without recourse to dictionaries or electronic gadgets of any sort.

    If you have difficulty assessing your own levels, I can set up some tests for you in Philadelphia at the University of Pennsylvania or in our Chinatown should you ever happen to pass through our city.

    Meanwhile, I would recommend that you read the following to get a sense of how different full-blown Cantonese is from Modern Standard Mandarin (MSM):

    a. the works of Robert S. Bauer on Cantonese phonology and orthography

    b. the works of Stephen Matthews and Virginia Yip on Cantonese grammar

    c. Don Snow's remarkable Cantonese as Written Language: The Growth of a Written Chinese Vernacular (Hong Kong: Hong Kong University Press, 2003). Appendix 1 gives 14 Cantonese texts, each of which Snow carefully analyzes for the degree to which it adheres to the norms of spoken Cantonese rather than of written MSM. The 14 texts, which cover a wide range of genres, date from around the 17th century to the contemporary period. It is striking that the percentages of overtly marked Cantonese (and Snow is referring here not just to special Cantonese characters) in these 14 texts range from only 3% to 36%: 3, 4, 6, 7, 10, 11, 12, 20, 23, 23, 23, 28, 32, 36, for an average of 17%.

    It is one thing to phonetically "map" (as you say) MSM to Cantonese and substitute a few lexical items from one to the other, but it is quite another to produce fully idiomatic Cantonese *on its own terms*. The paragraph from Wikipedia that you quoted is not really fully idiomatic written Cantonese, but basically a Mandarin matrix with a few token Cantonese features sprinkled in. Your Wikipedia paragraph has only 12% of Cantonese features.

    The reason for these low percentages of actual Cantonese language in many so-called "written Cantonese" styles is that the only forms of proper written Chinese that have been allowed are Classical Chinese and Mandarin, and the latter is still very much a Johnny-come-lately. It is against the rules for students to use Cantonese expressions in anything they write for a teacher, as at least one other commenter has pointed out. So there are just no models and standards for writing unadulterated Cantonese. Most so-called "written Cantonese" is perforce heavily adulterated and bastardized with Mandarin and Classical. If you want to find something that approximates real (spoken) Cantonese in writing, you've got to turn to the comics, a few newspaper columns, some unusual short stories and novels, and so forth.

    The good news, however, is that the Internet is increasingly a place for snatches of genuine Cantonese. Young people are working out the models for how to write their language on an ad hoc basis. This is called ciu4 jyu5 潮語 (here this means "trendy expressions," not Chiu-chow / Teochew language), and the teachers are all up in arms because some of it spills over into student papers, and that — real Cantonese — is anathema to the guardians of written Chinese rectitude. Within the next couple of months, there will be two papers on this subject published in Sino-Platonic Papers.

  65. Fluxor said,

    March 13, 2009 @ 10:31 pm

    @Victor Mair: Thanks for your lengthy reply and recommended reading. I think it's best I respond to your questions privately rather than clog up the comment pages.

    However, my final point still stands — that written Cantonese is not as hard to understand for a Mandarin speaker than spoken Cantonese.

    The same can be said for Japanese, that written Japanese is not as hard for a Chinese speaker/reader than the unintelligible spoken form. The use of Kanji ensures that's the case. I can pick up a Japanese newspaper and understand the gist of many articles. I've never studied Japanese in my life, but just having googled for "Tokyo Daily" brought me to a Japanese online newspaper with the following headline: 海自護衛艦、ソマリアへ向けまもなく出発. I have no idea what more than half of that headline says, but I can still tell it's about a naval defense battleship (海自護衛艦) that is going somewhere (…向…出発).

    There are so many common hanzi vocabulary between Cantonese and Mandarin that I simply cannot fathom that written Cantonese is as hard as spoken Cantonese for a hanzi reading Mandarin speaker. Have a Cantonese speaker say do1ze6 and the Mandarin speaker may be confused. Have the Cantonese speaker write down 多謝. Do you think the Mandarin speaker would still be confused? Is 多謝 simply an anomaly or do written Cantonese and Mandarin share a vast array of common hanzi vocabulary?

  66. Bill Baxter said,

    March 15, 2009 @ 11:32 am

    I'd like to add a couple of points I don't think have been raised.

    1. Chinese normally don't use the terms "language" and "dialect" at all; they use (in MSM) "yu3yan2" and "fang1yan2". If you look these words up in a Chinese dictionary, I don't think you'll find any mention of the idea that different fang1yan2 belonging to the same yu3yan2 should be mutually intelligible; it will say that a dialect is "yi1ge yu3yan2 de di4fang1 bian4ti3" ("a local variant of a language"), or words to that effect. The discussion might be clarified if we are aware that yu3yan2 and fang1yan2 are not exactly the same as "language" and "dialect" (whatever those mean in English), and if we needn't insist that they should mean the same things.

    2. I don't know the details, but I believe that in the 50s, linguists in China (especially western-educated ones) were taken out behind the woodshed and made to study the linguistic writings of Comrade Stalin, whose early 50s book "Marxism and Problems of Linguistics" was required reading; I think it still exerts considerable influence on Chinese linguistics today. In it Stalin rejects the views of Marr (who had been to linguistics what Lysenko was to biology). I'm told by reliable sources that the book was actually written by Chikobava.

    But the main point here is that, as I understand the Marxist-Leninist view of language put forward in this book, it is a property of a "nation" (Russian nacija, Chinese min2zu2) that it normally has its own language. (This was a widespread idea in the 19th century, not necessarily confined to Marxists.) Neither the USSR nor China were nations in this sense; rather, they were "multi-national states" (duo1 min2zu2 de guo2jia1) in which each nation was supposed to be allowed and encouraged to use its own language.

    If Mandarin and Cantonese were actually different languages, in this sense, then one might easily draw the conclusion that their speakers belonged to different min2zu2. Since China is a "multi-national state", this might not be a problem in theory, but it undermines the unity of the majority Han 'nation', and could easily be used to argue for different political arrangements, including the division of China into different states. I suppose even Fuzhou and Southern Min speakers would belong to different min2zu2. The fact that the question is connected to the political unity of China is, I think, what makes people take the discussion so seriously there.

    3. I'm not sure that linguists really need a precise definition of "language" (I mean the countable noun) and "dialect" anyway, in the sense that physicists need precise definitions of "mass" and "velocity". They are part of the folk terminology of English, which makes them part of the object of our study, but I don't see what would be lost by giving them up as technical terms. We can study the meaning and use of these English words (as we would study any other words), and when we do we will find, as we would expect, that they are loaded with all sorts of social and historical baggage.

    The linguistic facts on the ground can usually be described without them. In fact, the complex issues of what is mutually intelligible with what, under what circumstances, are probably easier to describe without using these loaded terms. If we are doing sociolingustics, then we get into the speakers' perception and classification of the languages they use, but it's still not clear to me that "language" and "dialect" are the best terms to use in this situation (unless, perhaps, we are describing English).

    4. Even "topolect" runs into the problem that there are speech varieties that can't necessarily be identified with particular places. Many places are multilingual, and it would be a mistake to take monolingual locations as the default case.

  67. dr pepper said,

    March 24, 2009 @ 3:22 pm

    Well if anyone is still reading this topic, here's my definition:

    A dialect is a language that lives on its cousin's couch.

  68. Victor Mair said,

    March 31, 2009 @ 6:56 pm

    Well said, Dr. Pepper!

    @Fluxor The point in your original comment to which I took exception is that you made it seem — in contrast to my experience ("quite the opposite," you said) and the experience of thousands of other Mandarin speakers I know) — that learning Cantonese is relatively easy because someone knows Mandarin. I still disagree with you on that point.

    As for highly Mandarinized Cantonese writing, I concede that a Mandarin reader can often more or less make out the gist of a passage. However, with full-blown Cantonese (50% or more marked Cantonese elements), Mandarin readers will be hopelessly lost.

    @Baxter The paper I have just finished (entitled "The Classification of Sinitic Languages: What Is 'Chinese'?") will reply to all of the issues you have raised (and more). In fact, several of your points are already covered in the original post and comments above.

  69. lhc said,

    October 17, 2009 @ 6:49 pm

    Kellen's point is really profound:
    "It's not intelligibility. It's their second language."
    When we ask speakers how much they understand of an other language, we have to consider the directionalities of -lect exposure. Who's been exposed to how much x (lang/dial/register), in what contexts, and what abilities have they developed as a result of that exposure?
    After all, whether exposure leads to understanding (let alone fluency) depends a lot on the social positions of the languages in question–which is why affluent kids in LA raised by Spanish-speaking nannies forget their Spanish by middle school, and migrant kids in Shenzhen learn to speak Cantonese without an accent.

  70. David Marjanović said,

    October 24, 2009 @ 2:03 pm

    Probably way too late, but…

    @David Marjanović: "The Sinitic-speaking Chinese traditionally consider themselves to be a single people, in Mandarin Hàn. It almost logically follows that they consider themselves to speak a single language, Hànyǔ…."

    But it doesn't really follow logically after all, no more than that all Indic-speaking Indians speak "Indian"

    That's why I wrote "almost logically" (though I obviously didn't express myself clearly enough): it's what's going on in the heads of the people that use terms such as hànyǔ, including but by no means limited to both Chinese governments, because it seems just plain obvious to them. Most grave errors of thinking seem just plain obvious.

    But thanks for another few clear examples of why it's an error…!

    When we ask speakers how much they understand of an other language, we have to consider the directionalities of -lect exposure. Who's been exposed to how much x (lang/dial/register), in what contexts, and what abilities have they developed as a result of that exposure?

    Exactly. Is my dialect (from Upper Austria) mutually intelligible with Standard German? Who knows? There are no speakers of my or a similar dialect who don't know Standard German (and are, at least if aged 6 or more, literate in it – the dialect isn't written). Is my dialect mutually intelligible with some dialect from central Germany (…well, not the crazy vowel-shifting ones maybe)? Who knows? Many, probably most, of the features that distinguish the two are the same that distinguish my dialect from Standard German…

    In one direction, these two questions are not answerable (even though they were maybe 100 years ago).

  71. Tom said,

    January 21, 2010 @ 8:00 pm

    It's funny how some people desperately want to treat the language-dialect distinction as somehow linguistic, although it obviously is and always has been political-sociological-cultural. If you just drop the whole idea that linguistic criteria should be used, most of the problems vanish. It's amazing!

  72. sonnymak said,

    March 28, 2012 @ 1:16 am

    I am native of Malaysia a polyglot nation. Native speaker of Cantonese. Grew up listening to Hakka, Hokkien, Hainanese and studied Mandarin.

    Could tell from young that these are variants of Cantonese and could could get a drift of what being said. This is due to early exposure.

    100 years ago the same speakers started to arrive in Malaysia and Mandarin education was non existent. The early pioneers couldnt understand each other as they were mostly uneducated peasants. I have listened to various variants of Mandarin and can find some degree of mutual intelligibility with them since I speak Mandarin.

    I could hear Toi san and Sei Wui as variant of Cantonese and could understand them. Hakka a little more distant but still similar.

    Hokkien , Teochew far from Cantonese but similar with Hainanese and I could have some degree of intelligibility as I understand hainanese.

    Foochow sounds like it is a completely different family of languages to me compared to those I mentioned before.

    Shanghainese is more difficult to understand but after a while could probably get 50% of what said due to some similarities with Standard Mandarin.

    Verdict? If you hear variant of a language you speak and after 48 hours you could not understand nor identify where is difference in words used and sound shift from your base language or understand at least 50% of it after you make the adjustment, then it should count as a different language and forget about the political factors.

  73. sonnymak said,

    March 28, 2012 @ 1:28 am

    Just to add here, if I only speak Cantonese, Toi San and Sei Wui would be familiar to me and high degree of intelligibility.

    Hakka more distant but have some basic sound system with Cantonese and basic vocabulary. Intelligibility would be 70% to monolingual speaker of Cantonese after 48 hours of listening.

    Hokkien different sound system though some basic vocabulary may have same roots. 60% intelligibility with Cantonese after 48 hours of listening to basic conversation.

    Mandarin would be hard pressed to a Cantonese speaker without prior exposure and after 48 hours would still be flumoxed. less than 50% intelligibility.

    Foochow and Shanghainese more distant to Cantonese than any of the above.

  74. Daniel said,

    April 9, 2012 @ 5:42 am

    Chinese “languages,” or “dialects”? I have been stumped by this question for years, but I think I have recently settled on a resolution.

    For the linguist, it would of course be ideal to just apply “purely linguistic” criteria for classifying speech varieties as languages or dialects, and some (like Franz Bebop, in his comments of March 12, 2009, 1:11 am and 8:54 am) are quite uncomfortable with designating languages based on “non-linguistic” criteria like political, socio-cultural, and historical factors (although the idea of these being “non-linguistic” is itself contentious; don’t these factors fall within the purview of sociolinguistics?). Using “purely linguistic” criteria, all the debates about Sinitic, Arabic, Hindi-Urdu, Serbo-Croatian, etc., etc. would theoretically be put to rest.

    The problem is, the major linguistic criterion that is being touted for language-dialect differentiation is the mutual intelligibility criterion, which has a host of inadequacies, including the following theoretical ones:
    (1) mutual intelligibility is not an all-or-none thing, the degree of mutual intelligibility is usually some value between 0% and 100% – What then should be the threshold of mutual intelligibility? 60%?, 70%?, 90%?, 91.5%?
    (2) intelligibility can be asymmetrical between two speech varieties
    (3) no universally accepted method for measuring intelligibility – For example, should it simply be based on the number of words understood? If someone understands 5 out of 6 words in the sentence “I will not sell my horse” that is 83% of the words, but if the word that is not understood is the crucial “not,” then the whole message does not get through.

    Aside from the theoretical ones, the mutual intelligibility criterion also has what might be called “practical” or speaker-related problems (as enumerated in http://www.linguasphere.info/spip.php?article171096) which would affect the measurement:
    (1) different linguistic aptitude among individuals (which probably explains why some of the Mandarin-speaking commenters had more difficulty in understanding Cantonese, whereas others had less)
    (2) previous linguistic experience and/or education of the speakers being evaluated (For example, considering that Standard Mandarin is taught in schools, is it really possible to accurately measure the degree of mutual intelligibility between a Mandarin and non-Mandarin topolect? Should we get uneducated speakers to measure mutual intelligibility?…Could it be that the commenters who had less difficulty understanding Cantonese were those who had a longer time of contact with Cantonese speakers?)
    (3) reciprocal feelings among speakers of the different varieties
    (4) subject matter under discussion (As you previously pointed out, “highly Mandarinized” formal written Cantonese may be easier to understand than conversational Cantonese in speech, comics, and short stories; I suspect TV news broadcasts would also be easier to understand than dialogues of TV dramas)
    (5) hearing acuity of the speakers

    And assuming that all these problems could be surmounted, there is another thing which would render the mutual intelligibility criterion useless: the existence of dialect continua (as already pointed out by LoveEncounterFlow, March 7, 2009, 4:50 pm, and Merri, March 9, 12:49 pm). The Sinitic topolects, which form a speech continuum, are often compared to the Romance languages in terms of diversity (“as different as French or Spanish is from Italian”—or some other variant of the statement). However, West Romance dialects constitute a continuum, and the division of these dialects into “Portuguese,” “Castilian,” “Catalan,” “French,” “Walloon,”…is arbitrary. If the division of West Romance or North Germanic into separate languages is considered arbitrary, then—following the suggestion that what applies to other languages must also apply to Chinese—the division of the Sinitic continuum into separate languages (whether 7, 8, 10, 13, or more) would be just as arbitrary. The debates about Jin, Old and New Xiang, and Northern and Southern Wu are perhaps a reflection of the difficulty of placing the dividing lines. The Sinitic continuum is also more likely to be perpetuated, since unlike West Romance or North Germanic with their numerous standard languages, the Sinitic sphere has only one standard language, toward which all the topolects may eventually gravitate. Since any division into languages would be arbitrary, the logical alternatives would be (1) to consider the entire Sinitic continuum as a single language, or (2) to consider each topolect as an individual language (which means there would be thousands of Sinitic “languages” if the counties are used as basis)

    Therefore, the reason for the Sinitic problem does not seem to be Chinese nationalism or official intransigence (though both may be very much present), but rather the fact that there is as yet no satisfactory, “purely linguistic” definition of language. In other words, it is not just a problem of Sinitic, but a problem of West Romance, North Germanic, West Germanic, North Slavic, South Slavic, Indic, Arabic, and Turkic as well. Or to put it in another way, it is not a “problem” of any of these, but rather a problem of the linguists themselves, who have yet to come up with a definition of language which conforms to reality.

    Since there is as yet no satisfactory linguistic definition of language, we will just have to fall back on the status quo and rely on the non-linguistic—political, socio-cultural, historical—definitions of language. It seems we would just have to accept the old familiar languages like Portuguese, Spanish, Norwegian, Danish, Ukrainian, Russian, Kirgiz, Kazakh, Hindi, Urdu, and, with apologies to Franz Bebop, Chinese.

  75. John M. said,

    November 1, 2013 @ 6:13 pm

    Daniel's comment: "However, West Romance dialects constitute a continuum, and the division of these dialects into “Portuguese,” “Castilian,” “Catalan,” “French,” “Walloon,”…is arbitrary"

    This certainly was true in prior centuries, when a clear dialect continuum existed from Italy through France to Iberia, but but now that these languages have become more standardized and spread (through both public education and mass communications), the linguistic boundaries have become much more rigid. Whereas people in southeastern France historically spoke Provençal, and people in northwest Italy spoke Piedmontese – two closely related Romance languages/dialects – they now almost all speak French and Italian, respectively – which are considerably more distinct. Mutual comprehension across the border is much lower than it once was. The same goes for the French/Spanish border, the Spanish/Portuguese border and so on. As the regional dialects die out, the continuum has broken down.

RSS feed for comments on this post

Leave a Comment