The burgeoning of Indo-European and the withering of many other languages

« previous post | next post »

"How the World's Largest Language Family Spread – and Why Others Go Extinct." Robinson, Andrew. Nature 641, no. 8061 (April 28, 2025): 31-33.

This is a review of the following three books:

Proto: How One Ancient Language Went Global
Laura Spinney (William Collins 2025)

The Indo-Europeans Rediscovered: How a Scientific Revolution is Rewriting their Story
J. P. Mallory (Thames & Hudson 2025)

Rare Tongues: The Secret Stories of HiddenLanguages
Lorna Gibb (Atlantic 2025)

Robinson begins:

A key human characteristic is our complex languages — about 7,000 of which are spoken around the world today. Understanding the origin and development of past and present languages
can help researchers to understand human evolution. 

Although today’s languages group into about 140 families, only 5 of these families are widely used: Indo-European, Sino-Tibetan, Niger–Congo, Afro-Asiatic and Austronesian. Indo-European languages form the largest family, if those who speak them as a second language are included — with 12 main branches ranging historically from northwestern China to western Europe. “Almost every second person on Earth speaks Indo-European”, notes science writer Laura Spinney in Proto, one of a trio of intriguing books exploring the history of languages, common and rare. 

Both Spinney’s lively book for non-specialists, and The Indo-Europeans Rediscovered — an academic study with broad appeal by archaeologist James Mallory — focus on the origins of this vast language family. By contrast, extinct and endangered languages are the preoccupation of Rare Tongues, a quirky study by linguist Lorna Gibb, aimed at all audiences.

Not all Indo-European (IE) languages that have ever existed have survived until today.  Two that are very consequential, because they are the first two known by name to have emerged from the IE progenitor — Hittite and Tocharian — went extinct millennia ago.

Hittite was mentioned in the Bible, particularly in the Old Testament, and lasted from approximately 1700 BC to 1180 BC.  They had a written language, inscribed on thousands (est. 30,000) of cuneiform tablets.  Among their texts was a horse-training manual associated with the people of the land of Mitanni that included techniques and knowledge from Old Indo-Aryan.  Whenever you see a pair of lion sculptures standing outside a bank, a government office, or other important building, bear in mind this art-architectural feature is carrying on a tradition from the Hittite capital of Hattusa.

As for Tocharian, although attested by only a small number of mostly fragmetary manuscripts, it has especially great significance for me personally, since evidence for its existence comes from archeological sites in the Tarim Basin (East Central Asia) where I worked for decades.  We have had many posts about Tocharian people, language, and culture on Language Log and articles in Sino-Platonic Papers, one of the most important ("The Problem of Tocharian Origins: An Archaeological Perspective", SPP, 259 [Nov. 2015]) being by J. P. Mallory, who also co-wrote (with VHM) another book from Thames and Hudson, The Tarim Mummies: Ancient China and the Mystery of the Earliest Peoples from the West (2000), and in whose honor this festschrift recently appeared:  Victor H. Mair, ed., Tocharica et archaeologica (Journal of Indo-European Studies Monograph Series No. 69 [2024]).

Following Robinson through his entertaining, edifying survey of the trio of tomes he has chosen to review, we come upon a host of fascinating topics. Take Mallory's appendix, for instance, which lists 176 individuals who, between 1686 and 2024, each proposed homelands, for Indo-European — “as far north as the polar regions and as far south as Antarctica, from the Atlantic to the Pacific”.  Next comes the well-known story of William "Oriental" Jones' semi-official discovery of the Indo-European language family.  Then it's off to the races, following the traces of the Indo-Europeans every which direction.

Along the way, we learn that it was physicist Thomas Young who proposed the name "Indo-European" for the emerging family, while helping to decipher the Rosetta Stone.  The Indus Valley civilization, which we have discussed for its mysterious, undeciphered script on several occasions here at recently, was a specious factor in the naming of the IE family.

The trail of the search for the homeland then shifts north to the Pontic-Caspian steppe. Beginning around 2015, the Yamnaya Culture (ca. 5300-4600 years ago) of that region, which had already been proposed as the IE homeland on archeological grounds, received support from genetics studies.  Synthesizing the research of Spinney and Mallory, Robinson concludes that  "language played a more influential part in the evolution of human societies than did nationalism, empires and wars."

The reviewer next turns his attention to the opposite side of the coin, the reasons why some languages become extinct while others thrive, which is the focus of Gibb's Rare Tongues.  The stories she tells of how political decisions were made, for example, in Namibia, that proved inimical to the native language of Oahiwambo, which is spoken by half of the population, in favor of English, which has official status, though spoken by less than 5% of the population.

The same kinds of problems endanger languages in many other parts of the world:

In Australia, for example, 93% of Indigenous languages are extinct or soon might be. And in India, 600 languages are endangered, partly because English has become a dominant language; 14 years ago, only 196 were, according to the UN cultural organization UNESCO.

Gibb then moves on to the subject of whistled languages:

Amazingly, these exist on all inhabited continents. They arise from the fact, Gibb notes, that “full sentences in whistled speech are intelligible over distances ten times greater than if you were shouting”. This is possible because, unlike shouting, whistling does not strain the vocal cords and permits powerful volumes over a narrow range of frequencies.

Another subject that is dear to my heart is that of Manchu:

Some languages nearing extinction have been revived. Manchu, an imperial language in China from 1644 to 1912, is now taught at universities across the country. Māori was made an official language of New Zealand in 1987 and is taught in schools. And Gaelic, promoted to an official language in Scotland in 2005, now appears alongside English on Scottish road signs.

Languages can be preserved, but you have to work at it.  Languages do not preserve themselves.  The people who speak them keep languages alive.

Together, these books capture the amazing complexity of languages worldwide, from many contrasting perspectives — including linguistics, archaeology, genetics and anthropology. However, readers must not expect to obtain a settled answer to the long-debated origin of Indo European languages. As a frustrated Mallory jokingly warns: “solving the homeland problem is the academic equivalent of herding cats”.

Typical J. P. Mallory humor.  Superlative!

 

Selected reading

[h.t. Ted McClure]



18 Comments »

  1. Melanesian priest said,

    April 29, 2025 @ 10:44 pm

    @Victor Mair do you read about non-IA languages in South Asia? Why people still can't explain the pronominalisation of verbs in (1) Kiranti-Magar-Khamic languages (2) Munda Kherwarian-Sora languages (3) Indo-Aryan languages spoken near or around in Munda areas. Maspero suggested a putative Indo-Aryan substratum/superstratum in these non-Indo Aryan languages caused them to become polypersonal like today. However, (1) the Kiranti polypersonal agreement system is extremely complex and it seems that Sino-Tibetan/Tibeto-Burman have evidence of a kind of native verbal agreement system similar to Kiranti such as those present in Nungish and Gyalrongic languages; (2) the Munda Kherwarian polypersonal agreement systems are also complex in the manner of exceedingly overmarking of arguments (Kherwarian languages like Santali and Ho allow triple agreement!), and the Sora verbal system is something entirely unusual, completely different to anything we have been talking about IE linguistics: noun incorporation. The Sora verbs allow incorporation of pronominal direct object arguments (I), direct objects (II), indirect objects (III), locative-goals (IV), instruments (V), and transitive subjects (VI). Among (3) Indo-Aryan languages, only IA dialects (Māgadhan languages) spoken near Munda areas and have intense contact with Munda speakers exhibit verbal polypersonal agreement akin to Munda and Kiranti. To what certain extent could linguists offer explanation for the extreme polarization between the isolating Vietnamese verbs and the polysynthetic Sora verbs that may shred light to the prehistory of the Austroasiatic family, which might have dispersed through maritime routes to India around 1,500 BCE instead of assumed land routes in MSEA according to Sidwell & Blench (2025).

  2. Martin Schwartz said,

    April 30, 2025 @ 12:13 am

    Against the assumption that the Hittites of the Bible are the
    Anatolian Indo-European speakers we now call Hittites,
    see Wikipedia "Biblical Hittites".
    Mrtin Schwartz

  3. Robin said,

    April 30, 2025 @ 2:52 am

    Is it possible that Chinese guardian lions also descend from the Hattusa lion gate?

  4. Victor Mair said,

    April 30, 2025 @ 8:40 am

    Good question, Robin. I've often wondered about that too, especially since the famous Chinese lion dance came from the west.

    "The foreign origins of the lion dance and words for 'lion' in Sinitic" (1/14/22)

  5. Jerry Packard said,

    April 30, 2025 @ 7:15 am

    “… To what certain extent could linguists offer explanation for the extreme polarization between the isolating Vietnamese verbs and the polysynthetic Sora verbs…”

    As languages evolve, they go through cycles of concatenating their morphemes, neutralizing them into affixes, and then putting them back together again into words (Sapir). So Melanesian priest’s point may be taken to ask how the isolated-polysynthetic cline reflects language evolution, or whether it is chicken-egg.

    Linguistic Darwinism, anyone?

  6. Victor Mair said,

    April 30, 2025 @ 8:36 am

    Michael Carasik did a podcast about the biblical Hittites a few years back. The audio can be found at this link.

    A pdf of the handout is available upon request.

  7. Chris Button said,

    April 30, 2025 @ 1:41 pm

    However, (1) the Kiranti polypersonal agreement system is extremely complex and it seems that Sino-Tibetan/Tibeto-Burman have evidence of a kind of native verbal agreement system similar to Kiranti such as those present in Nungish and Gyalrongic languages;

    I believe what happens in Kuki-Chin languages might be connected too. I didn't look into it when studying them though.

    My focus instead was on the complex verbal inflections that affected the syllable rules by causing length, tone, and coda alternations. Ultimately those inflections all seemed to come down to different reflexes of an earlier suffixal -s in a host of different conditioning environments. You'd be amazed at what a lowly -s suffix can do!

  8. Chris Button said,

    April 30, 2025 @ 3:33 pm

    Apologies, I meant to quote accordingly:

    However, (1) the Kiranti polypersonal agreement system is extremely complex and it seems that Sino-Tibetan/Tibeto-Burman have evidence of a kind of native verbal agreement system similar to Kiranti such as those present in Nungish and Gyalrongic languages

    I believe what happens in Kuki-Chin languages might be connected too. I didn't look into it when studying them though.

    My focus instead was on the complex verbal inflections that affected the syllable rules by causing length, tone, and coda alternations. Ultimately those inflections all seemed to come down to different reflexes of an earlier suffixal -s in a host of different conditioning environments. You'd be amazed at what a lowly -s suffix can do!

  9. Chris Button said,

    April 30, 2025 @ 3:36 pm

    "… affected the syllable rules…" should say "…affected the syllable rimes…"

  10. Melanesian priest said,

    April 30, 2025 @ 10:32 pm

    @Jerry Packard if you have paid some curiosity you may learn that the Munda pronominalisation area is overlapped with the Kiranti-Magar-Khamic pronominalisation area; Santali and Limbu are both spoken within few miles apart in Nepal's Province no. 1. To say this is coincidence that both language groups became pronominalised and highly synthetic but not other languages in their families respectively is doubtful. Still this issue is very understudied.

    The debate on proto-Austrosiatic morphology is still going on, some argue that proto-Vietnamese was highly synthetic like Munda, others say no.

  11. Yves Rehbein said,

    May 1, 2025 @ 7:27 pm

    Jerry said that it is a "cline" – not that it is "coincidence". That is just the null hypothesis, which you intuitively assume. I am reminded of a notice from my inbox: "Gradient in grammatical structure of indigenous languages reflects pathway of human expansion in the Americas" (Urban & Naranjo, in: Scientific Reports 15, 2025). Didn't read it, not interested in developing typology.

    From the fact that WALS does not mention "polypersonal", e.g. Santali has currently no data on agreement, it seems that this is underresearched indeed. Although, search suggests e.g. Inflectional Synthesis of the Verb, wherein Kiranti has a notable mention https://wals.info/chapter/22

    I could find copious references to Michael Witzel's works (NB: Sino-Platonic papers vol. 129) referring to putative loanword strata in Sanskrit from Himalayish and Munda (viz. "Para-Munda"), very fragile indeed. Lubotsky 2023 is noncommittal, ch. 15 in The Indo-European Puzzle Revisited. More research is needed, but the Austroasiaticists are dismissive, Anderson 2020, ch. 6 in Austroasiatic Syntax in Areal and Diachronic Perspective.

    Since phono-semantic matches are virtually guaranteed, I have tried to find lexical evidence. In particular at the notion of "iron" (11/5/20 above), compare Viet ~ Yue OC /*[ɢ]ʷat/ (NB: *wat, Michel Ferlus 2003, 2009, 2011) because the Sinograph interpreted as an ax is congruent with the Linear B sign AES as an ax after Lejeune (probably "bronze", cf. Michailidou 2008, Bronze age economy, in Pasiphae vol. II, with images). Surprisingly, the shape of AES looks similar to 耳 ěr "ear" from silk script onwards and it coincides with ear < PIE *h₂ṓws ~ *h₂éws- – since I briefly researched Persian xar goš on the occasion of Easter, cf. eas(t) < **h₂ews-ós – vis-á-vis *h₂wes-éh₂: Proto-Tocharian *wyäsā as the supposed source of Proto-Uralic *waśke (Mallory & Adams 2006: "metals"). Conversely, Manchu aisin (s.v. WT: ) and ancun 'earring' are linked via Proto-Turkic to 銅 tóng 'copper' (cf. Türkish altın), which appears to have the same coda as 東 dōng 'east' in Baxter and Sagart's Old Chinese reconstruction, /*[l]ˤoŋ/ "bronze, copper" vs. /*tˤoŋ/ "east". Any similarity to 耳 /*C.nəʔ/ is not obvious; fortunately there is 聰 /*s-l̥ˤoŋ/ “hear well, intelligent”.

    Today I had occasion to read The Bronze Age and Early Iron Age Peoples of Eastern Central Asia vol. 2 (Mair et al. 1998) and found probable confirmation, but this comment is already long enough (weapons: 577f. fig. 9, 578, 586 fig 3 no. 2; earrings: 748; lugs and bronze vessels 606ff.; 590f.; 731, 628, 636). NB: the short sword, "e.g. the Japanese tantō or wakizashi", compared to swords in Skythian fashion (p. 749).

  12. Chris Button said,

    May 1, 2025 @ 9:11 pm

    @ Yves Rehbein

    For 戉 ʁàt, compare also 戌 χə̀t, which is surely related and supports the uvular onset.

  13. Melanesian priest said,

    May 2, 2025 @ 11:58 am

    @Yves Rehbein a language that is spoken by ~10 million people (including the current Indian president) that WALS does not mention "polypersonal", e.g. Santali has currently no data on agreement is indeed understudied and also unfairly treated at the same time. However many grammars of Santali didn't use the WALS/Wikipedia terminologies (like "polypersonal", "inflectional synthesis") at all, they employed other terms like "object agreement", "possession raising", and "multiple referent indexations" (cf. Anderson, Sidwell, Neukom, Zide,…) which WALS might be not familiar with. The heck, South Asia has 400+ languages but WALS map of verbal person marking only shows 18 [https://wals.info/feature/102A#4/21.41/91.16] including Mundari, a sister language of Santali.

    I believe colonialism and pop linguistics substantially contributed to the marginalisation of Austrasitic languages after all. The Santhals and other Austroasiatic tribes traditional endure in the lowest positions of the Indian caste system. Their presences on popular platforms are virtually invisible but fancy languages like English and Japanese and others.

  14. Yves Rehbein said,

    May 2, 2025 @ 2:52 pm

    In view of the fact that the author of TFA, Andrew Roberts, has written about the Indus Script, which is sufficiently rare, which also adorns the cover of Mallory's book and was recently discussed here with reference to Santali, I hope this is not off-topic.

    @ Chris Button, I do not understand the calendrical signs, and I actually got confused about this as I pondered possible comparisons to the Indus Script, part 1.

    The gist of it is that the Indus "fish sign" allows multiple, arbitrary readings, which make sense as rebuses. Now I am reminded of it at the sight of earrings among the earliest gold and silver artifacts found in present-day China (Emma C. Bunker, "Cultural Diversity in the Tarim Basin Vicinity", ed. Mayr (1998), op. cit., cf. pages 606 ff.).

    It is a simple, inconsequential fact that the cusp shape of those rings resembles the fish sign as well as the Egyptian biliterals ⟨šs⟩, ⟨šn⟩. I wonder if this would coincidence with Egyptian nzw commonly spelled with the sedge hieroglyph biliteral ⟨sw⟩, which may be considered a prototype of Phoenician tsade, which you have compared to 戌, the eleventh earthly branch. This part of the argument is telic.

    The confusing part: I would originally compare 子 to the fish. Smith 2010 apud Wiktionary has 子 *tseʔ as the original sixth stem. Please correct me if I'm wrong, it seems that you have OBI 甾 *c > MC 子 ts- on first and OBI 子 *ɣ ~ ɟ > MC 巳 (?) on sixth. Moreover, you offer competing takes on zayin and ayin.

  15. Chris Button said,

    May 2, 2025 @ 3:09 pm

    @ Yves Rehbein

    甲 k, 乙 Ɂ, 丙 p, 丁 t, 戊 b, 己 ɣ, 庚 ᵏl, 辛 s, 壬 n, 癸 q, 子 ʦ, 丑 x, 寅 l, 卯 ʁ, 辰 d, 巳 ʣ, 午 ŋ, 未 m, 申 ɬ, 酉 r, 戌 χ, 亥 g

  16. Jonathan Smith said,

    May 2, 2025 @ 8:18 pm

    @ Yves Rehbein
    Re: OBI-era Branch #6, yes it is antecedent to modern "子" which now writes zi3 'offspring' etc.; see wherever you like for proposed ST associations re: this word. Re: OBI-era Branch #1, we simply don't know (the modern relatives of) this graph / term. I proposed that the graph might be related to modern "甾" and that the term at issue was 'darkened' (cf. zi1 緇 'dark') in an Early China paper from 2010-11 — don't know if this is what you are referencing. In a chapter from 2019 I suggested instead a relation to modern shuo4 朔 'go back to meet' given possible cognancy with Branch #7 wu3 午 'be facing' (context being the cross-pair matching that occurs all around the Branches.) But both guesses could well be wrong (or right?). Important things in this area of inquiry (among others) are simply to assume real words, use vetted/published/minimally-speculative "OC" forms (no "original scholarship" :D ), consider actually attested semantics (including of course actual words of actual modern languages), + other normal stuff.

  17. Chris Button said,

    May 2, 2025 @ 8:56 pm

    OBI 甾 *c > MC 子

    I think there is some confusion here. The ganzhi sign that was replaced by 子 is a historically discontinuous form that is wholly unrelated to 甾.

  18. David Marjanović said,

    May 5, 2025 @ 8:02 am

    something entirely unusual, completely different to anything we have been talking about IE linguistics: noun incorporation.

    Currently spreading in English. Examples are such verbs as bartend or course-correct.

    As languages evolve, they go through cycles of concatenating their morphemes, neutralizing them into affixes, and then putting them back together again into words (Sapir).

    It's not that simple – the Great Wheel of Morphology runs erratically forwards and backwards all the time. New affixes can and often do develop before the old ones have worn away… affixes can even be reinterpreted as independent words.

    Like evolution (which means "descent with heritable modification", not "progress").

    Proto-Uralic *waśke

    This is actually a Wanderwort within Uralic which hints at all sorts of fascinating things; trying to reconstruct it for Proto-Uralic leads to contradictions.

RSS feed for comments on this post · TrackBack URI

Leave a Comment