How Many Languages Are There in China?

« previous post | next post »

At least three hundred.

I like the title, not the one on the first panel, but the one at the top of each frame, which I have also given as the title of this post.

You probably don't have time to watch the whole video (13:54), but it's pretty good:

Two suggestions from me:


The speakers are still calling all these languages / varieties "dialects", which means they must be mutually intelligible and / or at least closely related, which is far, far from the truth.

From Middle French dialecte, from Latin dialectos, dialectus, from Ancient Greek διάλεκτος (diálektos, conversation, the language of a country or a place or a nation, the local idiom which derives from a dominant language), from διαλέγομαι (dialégomai, I participate in a dialogue), from διά (diá, inter, through) + λέγω (légō, I speak).

(linguistics, strict sense) A lect (often a regional or minority language) as part of a group or family of languages, especially if they are viewed as a single language, or if contrasted with a standardized idiom that is considered the 'true' form of the language (for example, Cantonese as contrasted with Mandarin Chinese or Bavarian as contrasted with Standard German).

Synonym: (often derogatory) patois

Synonym: vernacular


Why must these local lects be stigmatized?

Let's use the neutral, linguistically exact term "topolect", calqued on Sinitic "fāngyán 方言" (NOT "dialect").

(linguistics) dialect; topolect; regional language variety



Instead of referring to all of the many languages of China as "Chinese", I propose that they be divided into two main groups, "Sinitic" and "non-Sinitic".  Sinitic includes all the languages of China that fall under the designation Hànyǔ 漢語, not just Mandarin, and especially not just Modern Standard Mandarin (MSM) (Pǔtōnghuà 普通話).  Non-Sinitic would include Mongolic, Tungusic, Turkic, and the scores of other languages that are unrelated to Hànyǔ 漢語 ("Sinitic").

China is not a nation of linguistic uniformity, as is often falsely alleged.

Selected readings


  1. Philip Anderson said,

    November 17, 2022 @ 8:21 am

    Is topolect being used as a classification between dialect and language, or as a neutral term to avoid choosing one or the other?
    I agree that describing the non-Sinitic languages spoken in China as Chinese is misleading – Breton is not French, although it is a language of France. In terms of relationship, should Sinitic be considered more of a language family or a branch, like Romance?

  2. Jonathan Smith said,

    November 17, 2022 @ 9:02 am

    Fāngyán 方言, whence topolect, is arguably etymologically neutral (even here I am doubtful re: current and historical neutrality of fāng ~'region'), but has been and is used — cf. patois in France — to marginalize and stigmatize regional Chinese languages. My sense from afar is that the current tendency in Taiwan is to employ this term (also Taiwanese hong-gân) in the manner of Eng. dialect, i.e., to name regionally distinct but mutually intelligible varieties of e.g. Taiwanese but not say Taiwanese generally in contrast to Hakka, Mandarin, etc. (after all "jiǎng fāngyán" 講方言 not along ago named the crime of using one's home language in e.g. classroom environments.)

    Hànyǔ 漢語, which is used in all kinds of ways, is anyway also not a term that finds much favor among advocates for regional languages given the underlying suggestion of some meaningful ethno-historical "Han" unity.

  3. Victor Mair said,

    November 17, 2022 @ 11:03 am

    @Philip Anderson:

    "In terms of relationship, should Sinitic be considered more of a language family or a branch, like Romance?"

    I think of it more as a language family. If we think of it as a branch, then we have to decide what it's a branch of, and that is hotly contested.

    Thank you for your keen insight.

  4. David Marjanović said,

    November 17, 2022 @ 11:13 am

    What Sinitic is a branch of hasn't been contested since 1999. What has been contested is how to call it: Sino-Tibetan, Tibeto-Burman, or Trans-Himalayan?

    ("Sino-Tibetan" may be taken to imply that Sinitic is one half of the family and "Tibeto-Burman" the other half, which may or may not be misleading. "Tibeto-Burman" is confusing. "Trans-Himalayan" is meant as "across both sides of the Himalaya", but was only coined very recently and is easily misunderstood as implying that India is on "this side".)

  5. Victor Mair said,

    November 17, 2022 @ 11:45 am

    @David Marjanović:

    Being close friends of proponents on all sides of the argument, I can assure you that the issue is still hotly contested.

  6. Jonathan Smith said,

    November 17, 2022 @ 12:13 pm

    Many aspects of initial correspondences across hypothetical STAKATH remain poorly understood, due in large part to the lack of solid comparative conclusions within Chinese. And claims re: morphology can also be problematic since conclusions wrt early Chinese on this front are (1) predicated on the (incomplete) historical phonological picture and (2) subject to interference from our knowledge of TB languages. So agnosticism for the moment seems a good position, granting that STAKATH looks like maybe the most likely end result.

  7. Ted McClure said,

    November 17, 2022 @ 1:22 pm


  8. Jerry Packard said,

    November 17, 2022 @ 2:57 pm

    As I note in Packard 2021 (‘A Social View on the Chinese Language’, Peter Lang Publishers), though it is indeed contested, the predominant assumption in East Asian linguistics is that Sino-Tibetan is at the top of the language family tree, and that the first branching of that tree consists of a Tibeto-Burman branch on one side and a Chinese branch on the other. The contesters generally posit Tibeto-Burman (as David notes) — not Sino-Tibetan — as the parent family, with Chinese belonging to a sub-family located one or more levels below the posited Tibeto-Burman family.

  9. Jim Unger said,

    November 17, 2022 @ 3:02 pm

    Well, it is often said that there are 11 major dialect groups in the Sinitic family: Mandarin, Yue, Wu, Min, Jin, Hakka, Xiang, Gan, Huizhou, Pinghua, and Dungan. Of course, the CCP claims there is precisely one Sinitic language. Assuming that it has counted the N non-Sinitic languages spoken in China accurately, the quick answer to "how many languages are spoken in China?" is at least N+10.

  10. Jonathan Smith said,

    November 17, 2022 @ 4:16 pm

    Stakath = an ad hoc compromise designation for Sino-Tibetan a.k.a. Trans-Himalayan. Doing my part to calm the waters :D

    Re: # of Sinitic languages, drawing boundaries is of course to large degree arbitrary, but one would think there would be an actually realistic approximation floating around… if there is I don't know of it. "Min" HAS to be regarded as 10-20 full-fledged languages no? In "Southern Min" for instance, even placing ~Taiwan/Amoy and ~Tewchew/Swatow together is a massive reach; these must be as different as Spanish and Portuguese if not more.

  11. JPL said,

    November 17, 2022 @ 5:00 pm

    I know next to nothing about Sinitic linguistics or "Chinese" in particular (the little I do know comes mostly from Language Log), but my eyes caught only the "first title" of the video, and as I scrolled away I apparently formulated the thought, yet unexpressed, "how many Chinese languages are there?", on the pattern probably of "how many African languages are there?". We would never say, "how many dialects (or topolects) of African are there", "X is a dialect/topolect of African", or say, "I can speak African"; people who say such things usually get an eye-roll. And in referring to language families and historical relatedness we use terms like "Niger-Kordofanian" or "Nilotic". The terms "China" and "Asia" seem as arbitrary as "Africa", since "African" languages" is usually understood as pertaining to "Sub-Saharan Africa". Is that the kind of situation we have here?

  12. Jerry Packard said,

    November 17, 2022 @ 7:21 pm

    Number of languages in China. This is so difficult. Even if you use mutual intelligibility as one of the primary criteria, that concept has to be tweaked to reflect the *degree* of mutual intelligibility — there would have to be some sort of 'cutoff', such as, '2 speech systems are considered mutually intelligible if the 2 systems are more than 50% mutually intelligible' or something like that (c.f., Tang and van Heuven 2007, 2009).

  13. Jongseong Park said,

    November 17, 2022 @ 9:39 pm

    One concept that made an impression on me when I was young was that of fractal dimension – an expansion of the familiar idea of Euclidean dimension to allow for describing the complexity of fractal patterns that may fall between integers. For example, a Koch snowflake has a fractal dimension of around 1.2619.

    This makes me think that rather than trying to enumerate discrete languages and running into all sorts of issues about drawing boundaries, one might apply some statistical analysis on the internal variation of a language family to produce a ratio (not necessarily an integer) roughly equivalent to the number of languages it contains. A collection of two languages that have some degree of mutual intelligibility might count as 1.6 languages in this type of analysis, for example. Similarly, a single dialect continuum might be judged to have enough internal variation roughly equivalent to 1.6 languages.

    I wonder if anyone else has taken up this idea to estimate numbers of languages statistically.

  14. Chris Button said,

    November 17, 2022 @ 10:10 pm

    Why it's Sino-Tibetan and Tibeto-Burman but not Sino-Tibeto-Burman is beyond me. Tibeto-Burman is the older of the two, and the name comes from the value attached to the still-extant literary traditions of Tibetan and Burmese.

  15. David Moser said,

    November 17, 2022 @ 11:12 pm


    So true, even the criteria of "mutual comprehensibility" is incredibly blurry. An analogy I've used is "How many shades of blue are there?"

  16. Jerry Packard said,

    November 17, 2022 @ 11:22 pm

    @Jongseong Park I don’t know that this has been done. A practical problem with this approach would be that it would be difficult to define the boundaries of a language family without first defining its members, which kind of begs the final question. You could think of stat methods around this problem (e.g., varimax rotation), but then comes the issue of the final result, which would be the number of languages within a certain defined area (probably not an integer). What would that mean? You wouldn’t be able to identify the individual languages, so it would end up being a measure of linguistic diversity within a certain area. This, however, may be a desirable result!

  17. Jerry Packard said,

    November 17, 2022 @ 11:24 pm

    @David Moser – that is a good analogy.

  18. Jongseong Park said,

    November 18, 2022 @ 12:49 am

    @Jerry Packard Good point about the difficulty of defining language families. But we could sidestep this issue by applying this to any collection of language varieties whether they form a coherent group or not; no need to limit this to language families really if the goal is simply measure internal diversity.

    In practical terms, the statistical analysis could be done through pairwise comparison of data points to identify clusters. As a trivial example, you could feed in three completely unrelated languages and the analysis would simply find no overlap between them and return three as the diversity index assuming it is not fooled by accidental similarities.

    You're right, this approach wouldn't necessarily tell you much about what the individual languages are, just an idea of the degree of diversity. It could identify coherent clusters among the data points you feed in, but such clusters might correspond to language counts greater than one.

  19. Andreas Johansson said,

    November 18, 2022 @ 2:16 am

    So there's still informed scholars disputing that Sinitic is related to Tibetan and Burmese? You wouldn't guess from any popular source I've seen in the last couple decades at least.

  20. Victor Mair said,

    November 18, 2022 @ 8:15 am

    From Randy LaPolla:

    I agree with you, but 300 is too low. Years ago one of our students (Jamin Pelkey) did mutual intelligibility testing in 45 villages of the Phula people in four counties near the Vietnamese border, and found 24 independent languages, now registered with Ethnologue. This group is considered part of the Yízú 彝族 complex created in the 1950’s, so their mother tongue is considered to be standard Yi. As David Bradley pointed out many years ago, China is a lumper, putting all sorts of people together into a single “minzu” and the rule is each minzu can only have one official language, so the non-standard varieties are not recognized ((see Gerald Roches’ recent papers on this in the Tibetosphere). This is why Mandarin is seen as the mother tongue of all Chinese people by many in China (and Singapore) regardless of what language they actually speak. India, on the other hand, is a splitter, creating so many languages from what are actually mutually intelligible dialects. This gives the false impression that there is great linguistic diversity in India.

    See Poa, Dory & LaPolla, Randy J. 2007. Minority languages of China. In Osahito Miyaoka and Michael E. Krauss (eds.), The Vanishing Languages of the Pacific, 337-354. Oxford: Oxford University Press.

  21. Jerry Packard said,

    November 18, 2022 @ 8:22 am

    @Jongseong Park – Exactly right.

    @Andreas Johansson – No one says they are not related. The issue is the closeness of the relation and their degree of divergence over time. George van Driem has proposed a model in which the languages are represented by unconnected ‘leaves’ (rather than a tree) in which their size represents level of certainty and their relative position represents how closely related they are.

  22. Philip Anderson said,

    November 18, 2022 @ 8:38 am

    @Andreas Johansson
    The impression I got from the comments above is not that the existence of a relationship is in doubt, but the order in which the parent languages diverged. STB might indeed be a neutral term.

    While Africa and Asia are geographical conveniences, China is country which has a state language, which in other traditions would naturally be called “Chinese”. And leaving aside the minority non-Sinitic languages, there is just one language family. So the linguistic situation is quite different from Africa.
    Your point is more relevant for India, where Hindi doesn’t have the same dominance, and there are are a number of high- status Indian/Indic languages, so the concept of an “Indian” language is rejected.

  23. Jongseong Park said,

    November 18, 2022 @ 8:33 pm

    Historian and linguist Christopher I. Beckwith argues in Empires of the Silk Road citing his own research that "the widely believed theory of a genetic relationship between Chinese and Tibeto-Burman—the so-called Sino-Tibetan theory—seems to be based on a shared Indo-European lexical inheritance. Some of this material demonstrably entered Tibeto-Burman as loanwords via Chinese…. The most likely solution is that the Indo-European intrusion produced a creole not only with the pre-Chinese of the Yellow River valley but also with at least some of the pre-Tibeto-Burmans further to the southwest in the presumed home of Proto-Tibeto-Burman."

    Beckwith is evidently quite knowledgeable about many languages and the methods of historical linguistics but also holds an array of idiosyncratic views that run contrary to mainstream opinion. For example, he would also do away with the Indo-Iranian grouping, arguing that "Avestan looks less like an Iranian language than like a phonologically Iranized Indic language."

  24. Doctor Science said,

    November 18, 2022 @ 11:56 pm

    @Victor Mair:

    This gives the false impression that there is great linguistic diversity in India

    AFAIK the minimum, most lumpy estimates still show India has "great linguistic diversity". Do you estimate India's linguistic diversity is less than China's?

  25. Victor Mair said,

    November 19, 2022 @ 8:23 am

    @Doctor Science:

    You're quoting Randy LaPolla, not me.

    I'm pretty sure that Randy would agree with the thrust of your final, rhetorical question. Indeed, that was the point of his carefully worded and documented comment.

  26. Terry K. said,

    November 19, 2022 @ 11:15 am

    The 302 seems way too precise given the difficulty in deciding what are and aren't different languages. Mutual intelligibility isn't always clear cut.

  27. Philip Anderson said,

    November 20, 2022 @ 1:25 pm

    @Terry K
    Someone probably estimated 300, then two more were discovered, or reclassified.

  28. David Morris said,

    November 21, 2022 @ 6:54 am

    Did the idea of 'Chinese is one language' originate with the CCP, or does it date from earlier eg the Mandarins who had an interest in being the guardians of the language?

RSS feed for comments on this post