« previous post | next post »

Jichang Lulu briefly alluded to work on languages of Italy in the dialectometry thread (here [the whole comment is well worth reading, as are the comments by Jonathan Smith [here — this one on an earlier thread, here, here, and here] on that post). He also thought that Language Log readers might find of interest some comments in this paper by Mauro Tosco.

"Measuring languageness:  Fact-checking and debunking a few common myths", DIVE-IN

“Interestingly, the more traditional classifications are marred by purely sociolinguistic analyses – and quite often their accompanying political and ideological underpinnings – the more they are proven wrong when dialectometry is applied.”

(Tosco:  homepage; International Research Group on Contested Languages)

The article critically discusses a few common objections to an intrinsic, language-internal definition of what constitutes a ‘language’ (and, conversely, a ‘dialect’). It argues that, contra postmodernism (1.), languages do exist, they can be counted and languageness can be measured independently and even notwithstanding the speakers’ beliefs and ideologies (2.). It further refutes as unsound all the common criticisms to intelligibility as a tool in assessing languageness: while deviations from common-sense assessments may be expected but are not really of concern to science (3.1.), intelligibility asymmetries (3.2.), apparent infinite graduality (3.3.) and dialect chains (3.4.) are only partial problems to be solved empirically. On the contrary, intelligibility can and is routinely measured in different sciences, and, when applied to language, it tends to dovetail with other criteria, such as dialectometry and the counting of isoglosses (4.)

Envoi (instead of a conclusion)
Just as measuring the intelligibility between, say, English and Mandarin makes little sense, also a dialectometric approach to these languages will be a colossal waste of time, because zero or a figure close to it is the result. Crucially, dialectometry, as its very name implies, is a tool to measure dialectal difference: it is feasible up to a certain limit, but when whole phonemes (and all the phonemes in a string) are different it becomes impracticable. This does not detract from its usefulness: it is exactly the intricacy of multilingual situations across the globe among a multiplicity of minorities (their ‘messiness’, for the unfortunate monolinguals of many Western countries who since generations have been the victims of the aggressive linguistic policies of the modern state) that calls for painstaking measurement.

Is this “superdiversity” (Blommaert & Rampton 2012)? Maybe. Certainly, it is the only sensible approach to an assessment of language diversity, which, in its turn, is a prerequisite to salvaging what of it is salvageable (Tosco 2017).

For the time being, we can be content with reiterating that:
• languages do exist. Beyond the veil of political and ideological narratives, languages exist because communication exists; different languages are the result of different and mutually unintelligible solutions to the communication problem.

• languageness is measurable because intelligibility is measurable.

• while Ausbau-ization (Tosco 2008) involves the use of linguistic tools with a view to increase the distance of a language (its Abstand level) vis-à-vis its neighboring competitors, in the end it is Abstand languages that general linguistics deals with.

A place with recent views and references on these language classification issues — notably the contribution of dialectometry — is Lissander Brasca's 2023 doctoral thesis.

(Brasca and his supervisor Mauro Tamburelli have an earlier paper on Gallo-Italic classification based on dialectometry — referenced in the thesis, and above in the Tosco paper.)


Selected readings

  • Hammarström, Harald. 2008. “Counting Languages in Dialect Continua Using the Criterion of Mutual Intelligibility.” Journal of Quantitative Linguistics 15(1). 36-45.
  • Kloss, Heinz. 1967. “‘Abstand languages’ and ‘ausbau languages’”. Anthropological linguistics 9(7). 29-41.
  • Tamburelli, Marco & Brasca, Lissander. 2017. “Revisiting the classification of Gallo-Italic: a
    dialectometric approach.” Digital Scholarship in the Humanities 33(2). 442-455
  • Tang, Chaoju & van Heuven, Vincent J. 2009. “Mutual Intelligibility of Chinese Dialects Experimentally Tested.” Lingua 119(5). 709-732.


  1. .mau. said,

    May 1, 2024 @ 6:37 am

    Intelligibility was never mutual: Portuguese speaker may understand Italian, but Italian people find it difficult to understand spoken Portuguese. As for Lombard and Piedmontese, I believe that the latter has more forms taken from Occitan: but keep in mind that in some Piedmontese valley the dialect is actually a form of Franco-Provençal, which is Gallo-Romance but different.

  2. J.W. Brewer said,

    May 1, 2024 @ 7:01 am

    This is an interesting paper and I appreciate the link. I enjoyed the unusually-terse-for-scholarly-writing "So what?" as a response to a certain sort of objection. I don't quite know what to do with the dialect-chain analysis where the claim is that both a) to the extent the ends of the chain aren't mutually intelligible, you can figure out mathematically the minimum (and thus maximum?) number of separate "languages" the chain needs to be partitioned into, but b) there may in a given case be multiple, inconsistent partitions that will all work mathematically, thus making the choice among them (and which particular "dialect" is then assigned to which particular "language") arbitrary.

  3. Chris Button said,

    May 1, 2024 @ 7:10 am

    languages do exist, they can be counted and languageness can be measured independently and even notwithstanding the speakers’ beliefs and ideologies

    I mentioned this on the other post too.

    I personally once helped analyze the phonetics of a "Naga" language from India (albeit from spectrograms rather than fieldwork). I put Naga in quotes here because the language was actually Kuki, but the speakers identified ethnically as Kuki.

  4. Chris Button said,

    May 1, 2024 @ 7:11 am

    Sorry, I mean "identified ethnically as Naga."

  5. Jarek Weckwerth said,

    May 1, 2024 @ 7:25 am

    I second J. W. Brewer in recommending the piece. It's just 7 pages. And it's intriguing.

    I need to be cautious, so I'm going to limit myself (as someone brought up in the tradition of Labovian and Trudgillian sociolinguistics) to the following thought in reaction to Tosco's claim that "languageness is measurable because intelligibility is measurable":

    (1) Age is measurable. (2) Adultness (e.g. legal voting and drinking age) is 18 in my country. (3) Is adultness measurable using age in a fully objective and reliable way?

  6. Mark Liberman said,

    May 1, 2024 @ 8:04 am

    "Languageness is measurable because intelligibility is measurable"?

    But is intelligibility measurable in a reproducible and consistent way, across variation in speakers, listeners, topics, styles, acoustic conditions, etc. etc.?


    Or rather, every single one of those variables, and every one of their interactions, has a significant effect on any relevant measure of what we might choose to call "intelligibility".

    And as noted, intelligibility is also typically asymmetrical.

    This doesn't mean that there's no coherent statistical concept of "languageness".

    But it does mean that any attempt to define "languageness" in a coherent way will look a lot like a perception-focused version of dialectometry.

  7. Cervantes said,

    May 1, 2024 @ 9:02 am

    I'm not really getting the point of this. Yes, there is pretty much complete mutual unintelligibility between any version of English and any version of Mandarin, so those are separate languages. Obviously. But then it goes on to say that mutual intelligibility is a continuum and a matter of degree. It seems to me, ergo, that defining languages as separate requires some arbitrary cutoff and furthermore that is impossible to count the number of languages in the world or any large enough part of it. The piece seems self-refuting.

  8. Jarek Weckwerth said,

    May 1, 2024 @ 9:19 am

    @ Cervantes: That's the usual thing to happen when you try to impose discrete categories on a continuous variable (i.e. one that is not very evidently bi- or multimodal).

  9. J.W. Brewer said,

    May 1, 2024 @ 10:15 am

    To Jarek's age analogy, one might note that in the U.S. the line is drawn differently for different purposes, e.g. 16 to get a driver's license, 18 to vote, 21 to legally purchase alcoholic beverages. In some states 17, rather than one of the other options, is the age at which you can have sex with a substantially older person without that person being at risk of being charged with a crime. Sometimes there are choices: if you want to put money aside for the benefit of a child that someone else will control until the child is an "adult," in some states you can make a choice when you set the account up whether the child will get control at age 18 or not until age 21. The implication of this is perhaps by analogy that "are these different dialects of the same language or two different-but-closely-related languages" may have different answers depending on the context of the question and what you intend to do with the answer. That doesn't necessarily mean we're in post-modern total-subjectivity and nothing is real.

  10. Scott P. said,

    May 1, 2024 @ 10:41 am

    "Languageness is measurable because intelligibility is measurable"?

    But is intelligibility measurable in a reproducible and consistent way, across variation in speakers, listeners, topics, styles, acoustic conditions, etc. etc.?

    That would suggest that measurement is imprecise — different measurements under different conditions produce different results — but that's true of all measurements. Measure the brightness of a star, the atomic weight of helium, the efficiency of photosynthesis, and you'll get different results from different measurements. That just means the measurements have an error bar, and we know how to deal with error bars.

  11. Mark Liberman said,

    May 1, 2024 @ 10:54 am

    @ Scott P.: "That just means the measurements have an error bar, and we know how to deal with error bars."


    When you have a high-dimensianal space of relevant variables, and the measurement outcome depends in a non-linear way on their higher-order interactions, "error bars" are not the answer.

  12. AntC said,

    May 1, 2024 @ 6:33 pm

    I'm not convinced intelligibility is the only tool we have to assess relatedness between languages/dialects. Identifying the Indo-European family and its myriads of dialects didn't rely on (say) Sanskrit being intelligible to Greek speakers. Nor on their being a chain of intermediate (stages of) languages each of which was pairwise mutually intelligible. We'd also use non-linguistic evidence to assess the depth of contact between speaker communities. That (the lack of historical contact) rejects any relatedness between English and Mandarin.

  13. AntC said,

    May 1, 2024 @ 8:13 pm

    (so that nowadays linguists talk of the Chinese languages rather than the Chinese dialects, given the mutual unintelligibility of, for instance, Mandarin and Cantonese), … [quoting Comrie 1987]

    Ah, I suspected so: there's an agenda here. What's the "nowadays" saying there? Mandarin vs Cantonese managed to be always mutually unintelligible, irrespective of what (people who might have claimed to be) linguists ever talked of. Over the history of the Chinese empire there's been a political drive to assert the oneness of the Sinitic peoples, and downplay apparent linguistic/social differences. What was talked of was termed fāngyán 方言 — which corresponds neither to English 'language' nor 'dialect', as Prof Mair is at frequent pains to point out.

    I enjoyed the unusually-terse-for-scholarly-writing "So what?"

    I find this Tosco paper to be glib and superficial. The "unusually-terse" is of a piece with lack of scholarly thoroughness and rhetoric-in-place-of-reason.

    @myl But is intelligibility measurable in a reproducible and consistent way, across variation in speakers, listeners, topics, styles, acoustic conditions, etc. etc.?

    Good question. The only actual experimental result Tosco brings forward [Tamburelli
    (2014)] was a _written_ exercise. Does Tosco know even the first thing about languages? That's a serious question: he's at the 'Department of Human Studies'. Perhaps he's a failed Anthropologist. Given it's a short paper, why devote so much space [section 3.4] to lecturing linguists on how to suck eggs? Contra its last para, he's debunked nothing. (He could usefully study Wittgenstein on 'Family resemblance', then on the 'Private language argument'.)

    I do see it being open to perversion if this nonsense gets into the wrong hands: CCP will claim that because all PRC citizens can understand Putonghua after a fashion (they have to!), therefore they must all speak the same language. Therefore there is One China.

  14. Jonathan Smith said,

    May 1, 2024 @ 8:31 pm

    Good read — esp. on asymmetric intelligibility, where I agree both that the issue is basically down to (some measure of ) bilingualism and that one must take the lower of the two (asymmetric) readings — because *mutual* intelligibility.

    So re: Sinitic, one can imagine e.g. simply trying to measure the intelligibility of various regional languages to typical lifelong denizens of Beijing — thus sidestepping the tricky exposure~bilingualism problem.

    Of course this would turn out to be as pointless as testing Mandarin vs. English for the southern "dialects" — we observe every day that the group above understands next to nothing of spoken Hong Kong Cantonese, Amoy, Hokchiu, Moiyan, Wenzhou, etc., etc. (And if responses to that earlier post on Guizhou Putonghua are to be believed, intelligibility is not exactly sky-high for regional Putonghuas, let alone other Mandarins :/ )

    So studies like Tang & van Heuven (2009) are interesting but funky in many, many ways… it is very hard to know what is going on here without seeing at least the full prompts prepared for the languages involved; it probably goes without saying that these were influenced by the Mandarin versions from which they were translated (which were in many cases in turn weirdly influenced by the original English…)

    (I'm sorry but I must LOL at "ta1 men zai4 wan3 mao1 zuo1 lao3 su3 de you2 xi4" among others… I know this is down to proofreading but 'Standard Mandarin' teachers would have a red-pen field day here haha)

  15. Chris Button said,

    May 3, 2024 @ 4:16 pm

    To my point above about a speakers of a particular Kuki language identifying as Naga, I meant to quote this point about issues with languages often having "acompanying political and ideological underpinnings."

    I didn't mean to endorse or refute suggestions on how you define and separate out different languages.

RSS feed for comments on this post