Maps and charts of the world's languages

« previous post | next post »

A week ago on Thursday (4/23/15), the following article appeared in the Washington Post:  "The world’s languages, in 7 maps and charts".

These maps in the WP are thought-provoking and informative, but it is unfortunate that, like many other misguided sources, they lump all the Chinese languages (which they incorrectly call "dialects") into one. That's terribly misleading.  This would be similar to grouping all the Indo-European languages of Europe as "European" or all the Indo-European languages of India as "Indian".

This claim, which is unfortunately ubiquitous and keeps getting perpetuated no matter how many times it is debunked, is quite wrong.  After reading the article, I once again mentioned this error (about how nearly one and a half billion people in China are said to speak "Chinese") to friends and colleagues.

John Hill was prompted to write to the Washington Post editors about this matter.  Since it is highly unlikely that they will ever publish his entire remarks, if any, I post them here:

I was fascinated to discover your article, "The world’s languages, in 7 maps and charts" of April 23, while browsing the web.

I was so interested in the many charts and maps that I sent the link to my colleague, Professor Victor Mair, Professor of Chinese Language and Literature, University of Pennsylvania. He replied almost immediately, pointing out an important misleading claim that I, too, should have noticed.

The article says that "Chinese (all dialects)" has 1.39 billion native speakers, making it by far the most spoken language on the planet. However, as Professor Mair correctly pointed out, this is very misleading as many of the so-called "dialects" are mutually incomprehensible, and others can be only mutually understood with great difficulty. In fact, many of these "dialects" are at least as different as, say, Spanish, Italian and French, which are usually regarded as separate "languages" –  not "dialects."

The Wikipedia article on Mandarin Chinese, the most common Chinese "dialect" or language, gives a 2010 estimate of about 960 million native speakers.

Furthermore, it notes that there are various dialects of Mandarin, and speakers of one dialect may have difficulties communicating with speakers of another unless they switch to modern "Standard Chinese" which is now taught throughout China and Taiwan to help overcome exactly this problem. However, a 2013 report by the Ministry of Education claimed that about 400 million Chinese, or about 30 percent of the population cannot speak Standard Mandarin.

So, to say that there are some 1.39 billion speakers of Chinese is highly deceptive. I think it would be helpful if you could run a notice pointing this out in your next issue.

Even if you consider all the Sinitic languages of China as a single language (which they surely are not), it is dead wrong to state that there are 1.39 billion speakers of "Chinese".  As of July 1, 2014, the total population of China is estimated to be 1,393,783,836.  According to the 2010 census, 91.51% of the population belongs to the Han ethnicity (presumable speakers of "Chinese"), while 8.49% belong to "minority" (non-Sinitic) ethnicities.  No matter how you slice it, it is impossible that all 1.39 billion people in China speak a single language called "Chinese".

Because of the fallacies concerning the term fāngyán 方言, which has been perennially mistranslated as "dialect" (from Greek dialegesthai ["converse with each other"], from dia- "across, between" + legein "speak"; cf. "dialog", which ultimately comes from the same Greek roots), I created the word "topolect", which means exactly the same thing as fāngyán 方言, viz., "speech form of a place", to defuse the political, cultural, and linguistic confusion that arises as a result of mistranslating fāngyán 方言 as "dialect".  Cantonese and Mandarin, for example, cannot be "dialects" of each other or of some third language because their lexicon, phonology, grammar, etc. are different, not to mention the obvious fact that they are mutually unintelligible.  The same holds true for many other Sinitic languages.

For a few of the many earlier Language Log posts that touch on these vital matters, see:

"The American Heritage Dictionary of the English Language, 5th edition" (11/14/12)

"Is Cantonese a language, or a personification of the devil?" (2/9/14)

"Dialect or Topolect?" (7/1/10)

"Mutual Intelligibility of Sinitic Languages" (3/6/09)

See also herehere, and here, and this chapter on the classification of Sinitic in the Festschrift for Alain Peyraube):  "The Classification of Sinitic Languages: What is 'Chinese'?" by Victor H. Mair (梅 維恒).


  1. Fluxor said,

    May 1, 2015 @ 11:52 am

    The usage of the word "dialect" aside, I think the rest of the article isn't all that misleading. There is a graphic that show China's linguistic diversity (51) sits somewhere between Canada (60) and the US (33). So it's certainly not trying to say China is a Mandarin monobloc.

    As stated in the article, the 1.39 billion figure comes from "University of Düsseldorf's Ulrich Ammon, who conducted a 15-year-long study." It also says that "Ammon counts both first and second native language speakers." The graphic based on Ammon's work also states that "totals for languages include bilingual speakers."

    So while there certainly are not 1.39 billion native speakers of Mandarin, I would posit that the vast majority do learn Mandarin at school, making Mandarin most likely their second native language. Add to that the bilingual numbers in the Chinese diaspora and that 1.39 billion number seems a lot more plausible. However, the fact that 1.39 billion seems to match the current Chinese population seems curious. But given Ammon is a linguist and these are his numbers, any complaining about these results should perhaps be directed to Ammon rather than WP.

  2. H. Schwartz said,

    May 1, 2015 @ 11:54 am

    Are there any good data regarding what percentage of china speaks mandarin or a closely related (actual) dialect, what percentage as a language of education, as a second language, and how well, etc.?

  3. H. Schwartz said,

    May 1, 2015 @ 12:00 pm

    Also, it should be mentioned that situation with Arabic is similar to Chinese, probably worse because a. no one speaks standard Arabic natively (though I suppose some who learn it in school from a very young age may have a similar knowledge as a real first language), and b. many Arabic "dialect" (language) speakers are either uneducated or educated in a European language, not standard Arabic. The article seems to count all of Arabic homogeneous.

  4. J.W. Brewer said,

    May 1, 2015 @ 1:24 pm

    As with English you will undoubtedly get a much higher number if you add L2 (or L3) speakers of Mandarin, and also probably quite variable numbers depending on what level of fluency is thought sufficient to count. It is worth noting that historically many/most ethnic Chinese outside the PRC spoke non-Mandarin Sinitic languages as their L1 (or possibly a totally different L1, like English, depending on degree of assimilation). In post-1945 Taiwan most would learn pretty good Mandarin as a second language in school. In Singapore I think that has been much more variable although the gov't was supposedly pushing it at some point. Ditto Hong Kong / Macao, at least until they were handed over to the Communists.

    And in non-Chinese majority places like the U.S. I think there was minimal incentive to learn Mandarin at least until quite recently (when new immigration patterns meant more of your fellow Chinese-Americans would have that as their L1). When my now-fifth-grader was in kindergarten, the kids were all supposed to be taught how to count to ten in Mandarin, because of multiculturalism or something, and my daughter's teacher happened to be the only Chinese-American member of the school's faculty. But she had to memorize the relevant lexemes just like the white teachers, because as a child her parents had taught her how to count in Cantonese. Why would they teach a US-born child Mandarin, if indeed they knew it themselves, which should not be assumed?

  5. Eidolon said,

    May 1, 2015 @ 3:30 pm

    Even adding L2 and L3 MSM speakers, I do not think there are >1 billion MSM speakers in the world. Over 650 million people in China still live in rural areas, and in a lot of these areas, topolects/languages still take precedence over the "official" language. Further, there are major cities in China where MSM still isn't spoken by a large segment of the population, though that is changing.

    At the current rate, it is conceivable that there are going to be >1 billion MSM L1+L2 speakers in a few generations. But English is being learned at an even faster rate.

  6. Adrian Morgan said,

    May 1, 2015 @ 6:08 pm

    If I may go off on a tangent — but a linguistic tangent — it caught my eye that Victor uses the word "surely" in the old-fashioned sense of "certainly, assuredly".

    In present-day English, of course, the primary sense of "surely" is something like, "I dare you to disagree with me" or "I would be astonished to be wrong". It always implies some degree of doubt.

    Victor's quaint usage just struck me as a curiosity.

  7. Victor Mair said,

    May 1, 2015 @ 6:24 pm

    Well, Adrian, I was an English major more than half a century ago, and I am so unhip that I had never heard "OMG" pronounced (as "oh em gee") until yesterday.

    "OMG! American English" (5/1/15)

    See especially this comment.

  8. Adrian Morgan said,

    May 1, 2015 @ 7:35 pm

    Oh, there's nothing wrong with quaint usages! I wouldn't be surprised me if the older meaning is retained more often in some English dialects than others (Australian, here), and I'd be interested to learn which dialects those are. For me, though, using "surely" to mean "certainly" has the feel of 19th Century poetry or something like that.

    I made an explicit reference to the "slightly uncertain" connotation of "surely" in a story I wrote more than 20 years ago, in which a character begins an thought with "surely there was" then reconsiders and thinks, "no, even that small hint of doubt was too great" followed by a more emphatic paraphrase of what he was about to assert.

  9. Kirk said,

    May 2, 2015 @ 12:07 am

    Has anyone ever worked out what the Sinitic languages actually are? In 2003, Jerry Norman estimated that there were "hundreds" of mutually unintelligible varieties of Chinese, but I could never find out what they were, and have been under the impression that no-one knows. It's rather embarrassing to say that Chinese is a language family, but then to be unable to say what the languages in that family are.

  10. djbcjk said,

    May 2, 2015 @ 1:16 am

    Or to quote the great Leslie Nielsen: 'Don't call me Shirley.'

  11. Peter said,

    May 2, 2015 @ 4:54 am

    @Adrian Morgan: VM's usage of "surely" seems completely unexceptional to me (30ish, British originally, lived 10 years in US): it is signifying "certainly", but in a context where the statement is otherwise in question.

    (The sentence at issue: "Even if you consider all the Sinitic languages of China as a single language (which they surely are not), …")

  12. tsts said,

    May 2, 2015 @ 5:39 am

    While we are complaining about news articles' treatment of this issue, did anyone else do a double take when reading the recent article in the New York Times on Chinese in Brooklyn at ? It contained this gem: "New working-class immigrants, primarily Mandarin speakers from Fujian province, are flooding Sunset Park, Mr. Kwong said …".

    Who are these Mandarin-speaking working-class Fujianese? This is attributed to Peter Kwong from CUNY, who obviously knows his stuff, so I assume they must be misquoting him. (Or is there in fact some region in Fujian where people speak Mandarin?) This is not the first time the Times has been doing this in an article about Mandarin becoming more dominant in New York, basically suggesting that anyone not coming from Guangdong should be considered a Mandarin speaker. (I have also seen this tendency among some Cantonese, sort of like Texans claiming any non-Texan is a Yankee.)

    My understanding is that a large majority of new Chinese immigrants in NYC still come from regions where Mandarin is not the first language of most people. Now, in terms of lingua franca, they might choose Mandarin over Cantonese, though I have also encountered a number of Fujianese who picked up fluent Cantonese within a few years of arriving in NYC.

  13. Matt Anderson said,

    May 2, 2015 @ 2:48 pm


    During the time I lived in NYC (From 2000–2009), when I went to Brooklyn Chinatown or to the Fujianese parts of Manhattan Chinatown, the main language I heard people speaking in stores was Mandarin (though I certainly heard what I presume was some kind—or kinds—of Min while walking around and in restaurants in those neighborhoods).

    I spent much more time in Mandarin-speaking Flushing than in either of those areas, so I don't know how typical my experience is, but certainly a lot of Mandarin is spoken there. I don't know anything in particular about the demographics of Fujianese immigration to NY—could it be that (mutually unintelligible) Minnan and Mindong varieties are both represented?

  14. Greg Ralph said,

    May 3, 2015 @ 4:11 am

    And given Indonesian census data reports about 100 million first language speakers of Javanese, where's that? Let alone – given the tallies are supposed to include second language speakers, and all of China has been counted as such – let alone Indonesian/Malay? Another 200 to 300 million there …. Sloppy.

  15. Oona Houlihan said,

    May 3, 2015 @ 7:00 am

    The fascinating thing about the Chinese is that by virtue of a central power and some cross-cultural tradition e.g. of Daoism and Confucianism, despite all their cultural and linguistic differences they developed a common WRITTEN tradition (which even went as far as Japan where there certainly is no linguistic relation, Japanese being part of the Macro-Altaic language group together with the Finnish and Hungarian languages and the Turk dialects). Had Europe had a pictorial "shrift" instead of alphabets which tend to write each word differently even if it had the same, e.g. Roman or Greek, roots, maybe it would have been a politically united landmass too. They also would have a better understanding of language teaching as no one insists on "pronouncing" Chines and Japanese kanji …

  16. Victor Mair said,

    May 3, 2015 @ 1:35 pm

    From Ulrich Ammon:

    Thank you very much for your and your colleagues' challenging email as to the number of speakers of "Chinese", referring to the publication in the "Washington Post".

    I hope to be able to give a somewhat reasonable, though certainly not really satisfactory answer.
    All the figures relate to the time span between the years 2005 and 2010, because I could not find reasonably comparable figures for the various languages dating from more recent years. The figures for Chinese are certainly especially questionable. After consulting different sources I decided to follow Ethnologue 2009 (though this source proved to be quite unreliable for other languages), which offered 1,213 Mio. first language and 0,178 Mio. second language speakers for "Chinese", subsuming various varieties or languages under this label. These speakers of "Chinese" seem to be roughly the same people for whom Mandarin is (one of) their official language(s) or language(s) of teaching (medium not only subject of teaching) or of the mass media they consume, i.e. a kind of overarching standard variety (in a wide sense). These "speakers" or people comprise the great majority of the inhabitants of the Republic of China but extend also into at least Taiwan, Singapore, Malaysia and various "China towns" around the world.

    It is true that the native "dialects" (an indeed questionable translation of "fangyan") of a considerable part of these people are at least as different as "Spanish, Italian and French", i.e. not mutually comprehensible, but this does not rule out that Mandarin functions, and is accepted by most of them, as the overarching standard variety which combines these "dialects" to something like a single language. Various German dialects, e.g. Low German and Alemannic, are not mutually comprehensible either – but are similar enough to Standard German to be seen as (though remotely) related with it and accepted as their common standard. This acceptance results from or has been stabilized by confirmative language policies, teachings and communicative practices. Mandarin can "roof" (German überdachen, noun Überdachung) linguistically even more diverse varieties since – so to speak – half of its similarity with the roofed varieties is already guaranteed by the common script (writing system). With such a common script, combined with the respective language policy, Spanish, Italian and French could most likely also be overarched by a common standard variety and thus be bracketed together to a single sort of language. To insist on mutual comprehensibility for creating a common language, rather than to make do with recognizable similiarity, entails the danger of conceiving impractically restricted and small languages (or even "glottotomy"), which has been the typical language policy in the wake of colonialism and has left Africa, for example, as a continent linguistically cut to pieces.

    I have suggested the core of a general theory of how language standardization and creating comprehensive standard languages could be conceived and which social forces are usually involved in such a process. See. e.g., chapter B.2 of my recent book on the status and function of German in (today's) world. I attach this chapter and also the table of contents of the book to this mail (bibliographical details: Ulrich Ammon (2015) Die Stellung der deutschen Sprache in der Welt. Berlin/ Boston/ München: W. de Gruyter). A link to flyer and table of contents is: The book is, however, written in German.

    I am aware of the extreme roughness of my answer to your question but hope to have at least opened up some paths for understanding. It seems to me that a similar book as my own on German could, and perhaps should, be done on the status and function of Chinese in today's world or, for that matter, on any of the eight to ten second rank international languages (French, Spanish, Russian etc.) – those behind or below English as today's prominent international or, in some sense, even world language.

  17. Bart said,

    May 6, 2015 @ 3:22 am

    The Washington Post article is open to a number of criticisms. In a spirit of enquiry I wonder whether anybody can help me with just one particular point: Do linguists have a generally accepted definition of the term ‘native language’?

    The Washington Post article is based on the following simple model: everybody has either one native language or two native languages. (A complication is that the article makes a distinction between a person’s ‘first native language’ and a possible ‘second native language’; however, this point is not developed.)

    This raises the questions: What does it mean to say that a certain language counts as a ‘native language’ for a certain person? Suppose a person speaks a number of languages; what criteria determine which of those languages count as a ‘native language’ for that person and which ones do not?

    Suppose a Belgian grows up speaking Flemish in Antwerp, his family moves to Liege, and he spends all his adult life there speaking French almost all the time; what counts as his native language(s)? What about his cousin who stays speaking Flemish in Antwerp, runs a shop, and can just about struggle by if a customer insists on speaking French?

    In Indonesia many tens of millions of people are brought up speaking a regional language such as Javanese or Sundanese, but as young children, under the influence of school and tv, they soon become fluent in Indonesian, the lingua franca of the whole country. Do all these people have one or two ‘native languages’?

    Much more generally: Is a person who possesses a certain ‘native language’ necessarily a competent speaker of that language? Does a person who is a competent speaker of a certain language necessarily possess that language as a ‘native language’?

    To resolve questions like these we need some clarity on what the concept ‘native language’ means. I’m not an academic linguist, so I ask the question: Does any consensus on this exist?

  18. Jarkko Hietaniemi said,

    May 6, 2015 @ 2:47 pm

    I can't help to think that the case of Sinitic speakers sharing a common written language despite their spoken languages being unintelligible to various degrees is somewhat analogous with the European use of Latin. Nobody's native language after early medieval, but educated people could speak and write it fluently.

RSS feed for comments on this post