Extravagant claims for the number of "Chinese" speakers

« previous post | next post »

Journalists keep repeating the same bunkum about "Chinese" having 1.197 or even 1.39 billion or some other ridiculously large number of speakers.  Countering a Washington Post article, I debunked this notion in "Maps and charts of the world's languages" (5/1/15).

Around a month later, the same claim was made in "INFOGRAPHIC: A world of languages – and how many speak them", South China Morning Post (5/27/15; updated 6/4/15).

As my colleague Arif Dirlik quipped, "What is the point of this map? That 'China' has a big population?"

The SCMP graphic was reposted on mental_floss as "Proportional Map of the World's Largest Languages" (6/8/15), so it's getting a lot of exposure.

I will not repeat here what I wrote in my 5/1/15 post about the WP article and in dozens of other Language Log posts over the years, except to say that, on the same SCMP map that lumps all of the Sinitic languages together as "Chinese", which allegedly has 1,197 million speakers, the following languages are listed separately (in millions of speakers):

Persian 57

Italian 63.8

Urdu 64

Russian 166

English 335

Marathi 71.8

Hindi 260

German 78.1

Portuguese 203

Spanish 399

Bengali 189

French 75.9

Lahnda 88.7 (Western Punjabi); the map doesn't mention Eastern Punjabi, nor does it include Greek, Albanian, Armenian, Baltic languages, Assamese, Nepali, Ceylonese, and many other IE languages of the South Asian subcontinent and elsewhere, some of which are of substantial size)

If we only count the numbers of the IE languages on the map, they add up to 1,962.6 million, which is larger than the inordinate figures claimed for "Chinese" and linguistically about as coherent as "Chinese".  Even "Mandarin", which supposedly has 848 million speakers, has widely varying degrees of mutual intelligibility among its numerous varieties.

By now, however, it would seem that even the most ardent proponents of "Chinese" as an imagined megalanguage are slowly beginning to realize that it's not a single, monolithic language with a unified vocabulary, phonology, and grammar that is mutually intelligible among all of its speakers.  Consequently, the SCMP article refers to "Chinese" as a "macrolanguage [that] includes different languages and dialects".  But those are weasel words; moreover, they are in fine print.  Most people who walk away from reading the SCMP article are going to think that "Chinese" is a single language with 1,197 million speakers.

[Thanks to Geoff Wade]


  1. Eric S said,

    June 10, 2015 @ 3:17 pm

    My textbook in my intro Linguistics class said that Chinese has 1 billion speakers, and my prof noted that the book meant to say Mandarin Chinese, and that the number was outdated and the true number was now 1.2 billion (the class happened in 2002).

    The book is Language: Its Structure and Use (3rd ed) by Edward Finegan, 1999. The table on page 471 says:

    Chinese 1 billion
    English 350 million
    Spanish 250 million
    Bengali 200 million
    Hindi 200 million

    I have therefore been confidently going about for the last 13 years telling everyone I know that Mandarin Chinese has more than 1 billion speakers.


  2. K. Chang said,

    June 10, 2015 @ 3:47 pm

    I was wondering when are you going to post something about that graph, which was ALL OVER the place today. :D That graph, even to me, as a Chinese speaker, looks odd.

    But I did learn that there were Minbei, Mindong, and Minxi in addition to Minnan. :)

    It's as if they didn't bother figuring out the dividing line between a dialect vs. a language, and just decided to lump ALL "Chinese dialects" under Chinese.

    Did the media do the same sort of things to the old Soviet Union? like treat ALL of the satellite republics as "speaking Russian"?

  3. Rubrick said,

    June 10, 2015 @ 4:02 pm

    I'm sure some of the muddled thinking comes from the fact that the situation for written Chinese is quite different from that for spoken "Chinese" — though surely the 1.2 billion figure would be wildly off for the former as well.

  4. Mark Mandel said,

    June 10, 2015 @ 4:51 pm

    Face it, a LOT of folks don't have the foggiest notion about this, and another LOT of folks are used to something like the dialect/language distinction as developed w.r.t. more-or-less alphabetically-written European languages and muddled into popular thought. "They all read the same? Then it's the same language!"

  5. AB said,

    June 10, 2015 @ 4:53 pm

    Delightfully random.

    German here does not include Swiss German. Hindi and Urdu are separate languages. But Mandarin, Wu and Hakka are all the same? If you say so…

  6. Ken said,

    June 10, 2015 @ 7:40 pm

    I suppose the next claim would be that there are seven billion people who speak one of the (not-mutually-intelligible) dialects of Human.

  7. phspaelti said,

    June 10, 2015 @ 8:44 pm

    They're not dialects. They're just notational variants of UG.

  8. Randy McDonald said,

    June 10, 2015 @ 9:57 pm

    The Sinitic languages are as diverse as Indo-European? I'd earlier heard comparisons to the Romance languages.

  9. Neil Dolinger said,

    June 10, 2015 @ 10:13 pm

    K.Chang said,"I did learn that there were Minbei, Mindong, and Minxi in addition to Minnan."

    Would Mindong be the same as Taiwanese/Hoklo?

  10. Doctor Science said,

    June 10, 2015 @ 10:40 pm

    Is there a *good* map or discussion of the different spoken Chinese languages? I know only "Han and Wu", and I'm guessing that's a pretty gross oversimplification.

  11. möngke said,

    June 10, 2015 @ 11:20 pm

    Austria also apparently does not exist.

    It's interesting to look at the Arabic numbers, if only as an indication of how absurdly out of date the Ethnologue numbers are. Gaza Strip alone has 1.8 million inhabitants by now – more than the graphic, following Ethnologue, cites for all of Palestine. Ethnologue also tells me this is based on a study from 1996, which is 19 years ago ("may date back more than 8 years" on the infographic is a bit of an understatement).

  12. Robot Therapist said,

    June 11, 2015 @ 3:44 am

    So following what Rubrick said, can someone tell me what the situation is for written "Chinese"? Are these one billion plus people all mutually intelligible in writing?

  13. Vincent said,

    June 11, 2015 @ 6:45 am

    The written language is all the same. Except A few hundred characters are different between traditional and simplified text. The Chinese govt after the most recent civil war kinda half assed making the written easier to learn. So now it's just different versus any easier.

    In China simplified is used, outside China traditional is more common. This is changing however as new waves of Chinese immigrants flow outward since they've been allowed to leave the country.

  14. languagehat said,

    June 11, 2015 @ 8:11 am

    I'm not sure where the outrage is coming from, since the chart explicitly says Chinese “includes different languages and dialects.” Yes, the treatment of Chinese is inconsistent with that of other language families, but so it is in the world at large, and it's foolish to expect consistency anyway. I think the creator of the chart should be congratulated for acknowledging that Chinese includes different languages rather than rapped over the knuckles for not taking a sufficiently principled stand.

  15. Dan said,

    June 11, 2015 @ 8:57 am

    re: Vincent and Robert

    "Identifying Written Cantonese"

    Particularly useful are the comments at the end of the post.

  16. Eidolon said,

    June 11, 2015 @ 10:41 am

    Written Chinese between the different provinces of the PRC is not 'all the same' but it is mutually intelligible because everyone, as stated before, learns a form of written Mandarin. That is to say, written Cantonese, Hakka, etc. are not generally taught; rather, students learn standardized Mandarin writing that then allows them to read the standardized Mandarin newspapers that the media puts out, etc.

    One of the reasons why such statements about Chinese are frequently made, even/especially by Western outlets who have no cause to spread linguistic propaganda, is because the linguistic situation outside of major first-world countries is, in fact, not that well understood. First, the mutual intelligibility criteria for dividing dialects from languages is not universally accepted among linguists – eg certain linguists believe continuum intelligibility. Second, the amount of people who have native-level fluency in Standard Mandarin is not agreed upon. Third, the same is the case for a lot of the other lingua francas, including English.

    Think about how difficult it is to find out just how many people have native-level fluency in English among the hundreds of countries in the world. The same is the case for Standard Mandarin though to a lesser degree because its speakers are mainly concentrated in a small subset of countries. Nonetheless, the sort of detailed linguistic studies that have to be done to know how many people speak MSM at a native level in China do not exist. At best, you have unreliable extrapolations that change every few years, which is not necessarily what makes it unreliable because the linguistic situation in China itself is changing all the time.

  17. Stephen said,

    June 11, 2015 @ 11:19 am

    Well I don't think that this part of the chart is at all clear let alone explicit. It says Chinese in a large size and then in a smaller size says " 'Chinese' as a macrolanguage includes different languages and dialects".

    – Whilst the probable meaning is obvious, I have never come across the word macrolanguage before.
    – The quotes presumably imply that the author is using Chinese to mean something other than what the readers think of when the read the word.
    – Saying a (macro)language includes a number of languages is somewhat counter-intuitive.

    If the headline had been something like 'Chinese language family' then it would have been considerably clearer and a lot more helpful.

  18. languagehat said,

    June 11, 2015 @ 1:29 pm

    Of course what you say is true, but we are not living in a world in which most statements about Chinese are accurate and this guy for some reason is making a mess of it, we are living in a world in which almost all statements about Chinese (outside of places like Language Log) are wildly inaccurate and make me froth at the mouth, and this guy, while not putting it in the ideal fashion, is definitely on the side of the angels, and I don't believe in letting the perfect be the enemy of the good.

  19. Stephen said,

    June 11, 2015 @ 2:11 pm

    I am intrigued by
    "we are living in a world in which almost all statements about Chinese … are wildly inaccurate and make me froth at the mouth"
    as I would have though that referring to Chinese as a language (rather than as a group of them) is a pretty basic error. As a layman in this area I cannot think of anything more fundamentally wrong.

    Can you give some examples as to what you really upsets you?

    "and I don't believe in letting the perfect be the enemy of the good"
    I agree with that idea in principle. However I think that different people will have different ideas as to whether this is 'good' or just 'less bad' than some other examples.

    This is an article aimed at the general public, so I would have thought it easy and clearer (and so better) to say "Chinese language family. Major family members are …" rather than "Chinese. By 'Chinese' we mean a macrolanguage …".

  20. languagehat said,

    June 11, 2015 @ 3:16 pm

    as I would have though that referring to Chinese as a language (rather than as a group of them) is a pretty basic error

    I would too! Good thing he didn't do that!

  21. Stephen said,

    June 11, 2015 @ 4:37 pm

    You're just being silly. It is perfectky clear that 'Chinese' is listed as a language with 1,197 million speakers.

  22. languagehat said,

    June 11, 2015 @ 4:59 pm

    I'm not being silly, though I am getting a little irritated. As I said in my first comment in this thread, the chart explicitly says Chinese “includes different languages and dialects.” If you choose to ignore that in favor of beating this guy over the head with a stick that doesn't apply to him, I can't stop you, but I'm sure not going to go along with it. And since I also wrote "what you say is true," it should be clear that we don't differ on basics; you simply appear to insist that everyone state things exactly the way you prefer to see them stated. To quote myself yet again, I don't believe in letting the perfect be the enemy of the good.

  23. Jeff W said,

    June 11, 2015 @ 10:55 pm

    I’m with you, languagehat.

    I think a fairer characterization would recognized Mr Alberto Lucas López for stating that “‘Chinese’”—the quotation marks are a good thing  here—“includes different dialects and languages”; listing thirteen (!) of them (Gan, Hakka, Huizhou, Jinyu, Mandarin, Min Bei, Min Dong, Min Nan, Min Zhong, Pu-Xian, Wu, Xiang, Cantonese); and further noting that that listing “does not include all Chinese languages of dialects.”

    The way I read the graphic is “here’s how languages around the globe stack up, including an aggregate of those that we call ‘Chinese’.” It’s a bit of a mismatch, obviously, but I don’t think that that puts Mr López quite in the Chinese megalanguage clique—he’s a useful idiot, maybe.

    My impression is that Mr López is taking as a given that people—and Ethnologue, the “comprehensive reference work cataloging all of the world’s known living languages,” where Mr López got the data from—all refer to these languages as “Chinese”—that’s just a fact, without an underlying claim—and creating the infographic for the SCMP with that in mind. Perhaps the SCMP editors who asked for the graphic think their readership has an interest, for whatever reason, in how what people refer to as “Chinese” compares to languages around the world. Maybe there’s a cultural aspect embedded  in the design of what is ostensibly a graphic about languages.

    The “macrolanguage” language is how Ethnologue refers to Chinese. The 13 languages mentioned in the graphic are, again, what’s referred to on Ethnologue. Their inclusion on the graphic, even in fine print, doesn’t strike me as a feint to distract from the true “megalanguage“ agenda—it’s just an acknowledgment, with more information, that Chinese comprises those languages (as well as others not listed). (Referring to Ethnologue explains why, on the graphic, in an otherwise alphabetical list, Cantonese appears at  the end—it’s listed as “Yue” there.)

    That doesn’t mean that that we can’t—and shouldn’t—assess the graphic in light of the never-ending struggle against the proponents of the mythical Chinese megalanguage—it plays into their nefarious hands, obviously, and tends to perpetuate continuing misconceptions about a single monolithic Chinese—it just means that the graphic itself might not have been created with those considerations in mind.

    Can we argue that it should have been designed differently? Sure.

  24. Bart said,

    June 12, 2015 @ 2:44 am

    Randy McDonald raised a very good question a couple of days ago:
    Is the variety within the set of 'Chinese' languages comparable to the variety within the set of 'Romance' languages or to the variety within the whole set of 'Indo-European' languages?

    Variety is a tricky thing to measure of course.

  25. Eidolon said,

    June 12, 2015 @ 10:35 am

    @Bart variety is indeed tricky to measure, and I don't think there's been any formal study of it in Sinitic. But age-wise, Sinitic is a lot younger than Indo-European, and is on the order of the Romance languages in having diverged around the transition between 200 BC to 200 AD. Age is used at times as a proxy for divergence in comparative linguistics, so I think the Romance analogy is apt.

    Sino-Tibetan, on the other hand, is a lot older. I'd put it in the range of Indo-European.

  26. Stephen said,

    June 12, 2015 @ 1:03 pm

    It is perfectly obvious that you are, deliberately or otherwise, misinterpreting what I said.

    You said (http://languagelog.ldc.upenn.edu/nll/?p=19462#comment-1496464) that the author did not call Chinese a language and that is what I called silly.

    Just because you said that the chart is explicit does not make it so and as I said, and you agreed, the chart is not even clear let alone explicit.

    The chart purports to compare languages and then groups together what it refers to as separate languages under the heading of a single language. So when comparing what it calls Chinese with any other language we are clearly comparing apples and oranges, or at least apples & pears.

    I am not insisting on how things are reported, rather I am saying that, especially in a non-specialist article, it would have been equally as easy to have presented the information in a clearer and so more helpful manner. One option would have been to use 'Chinese language family' as the header. Another would have been to put the quotes around that header (rather than in the note). I am sure that there are many other options as well. Any of these would have helped to make it clearer that this part of the circle is not quite the same as the other ones.

    Also repeating myself, I agree with the idea of not letting the pursuit of the perfect driving out the good. However you are not the arbiter of what is good.

    Jeff W rightly points out that many people refer to all of these languages as Chinese, which is a gross over simplification of the situation. IMO for this aspect of the chart to be regarded as good (rather than just repeating a common misconception) I would expect to see that misconception challenged up front, rather than in a note.

    In the comments on the SCMP site a number of other errors are mentioned. IIRC these included the numbers / spread of Portuguese, French & Farsi. Also there were comments about the grouping of all the 'Chinese' languages under Chinese but listing the Indian languages separately. Most clearly an error, and one that has not been corrected, several people pointed out that German section had 'lost' the 7+ million German speakers in Austria. So I don't that calling this good is really justified.

    Finally, you said that you "almost all statements about Chinese … are wildly inaccurate and make me froth at the mouth" and I asked you to list some examples. If you were interested in a constructive dialogue I really would have expected you to done that by now.

  27. E-Ping Rau said,

    June 15, 2015 @ 4:47 am

    @K. Chang: actually I don't think there's the Minxi variety (I once made the same mistake too :p).

    @Neil Dolinger:
    Taiwanese/Hokkien language would be the Minnan variety (the preferred name of the language is often disputed by its speakers, but Hoklo, as far as pronunciation goes, is almost certainly not correct – Hohlo would be closer)

    The Mindong variety would be referring to the Fuzhou language (which my mother still speaks to a certain extent, thankfully). Although you can make out some resemblances, it is largely unintelligible to Minnan language, and bears some peculiar phonological changes not unlike the Irish lenition/eclipsis.

    @Doctor Science : This Wikipedia entry might be of help? I can't say the map that comes with it is *good*, but the at least the main varieties are covered. Incidentally it also claims that "The differences are at least as great as within the Romance languages", although I don't know where the sources for that are……

    And I agree with Eidolon: not even the written forms of the various Chinese languages are that mutually intelligible. The reason we allegedly "comprehend each other" when using written language is simply because we are all taught written Mandarin in school. That doesn't mean written forms of other Chinese languages don't exist: go to any Hong Kong discussion boards, and you will most likely encounter many words you don't even know existed, let along pronouncing or comprehending them. And Taiwan's Ministry of Education has codified an orthography for "Taiwan Minnan language" (the dictionary is here for those interested), which despite bearing some similarities to Mandarin, is most definitely not entirely comprehensible for people who speak no Taiwanese.

  28. Lily C said,

    June 18, 2015 @ 10:11 am

    Perhaps one way to sort this out would be that anybody who attended a school or lives in an area where Putonghua is a recognized language, used in school or government, counts as a member of the Putonghua language community, regardless how little or much they any dialect at home.
    But then shouldn't the map of English be larger–since most people in most areas of Europe, the Middle East, South Asia, and the Pacific Islands learn some kind of English in school and use some kind of English in governance?

RSS feed for comments on this post