The origins and affinities of Tocharian

« previous post | next post »

I asked several IEist colleagues:

Of all the IE languages, which one is Tocharian closest to?



Answers received:

James P. Mallory:

I can't answer that but since almost all the phylogenies have it leaving after Anatolian one could argue that it does not actually have any close relatives, i.e., it was never really paired with any other branch before it left. This would make sense if you go with the Yamnaya > Afanasievo > Shamirshak route.

Don Ringe:

None.  In other words, if you ask the question that way, you can't get a coherent answer.  The situation seems to be the following. 

The Anatolian subgroup is clearly half of the IE family; *all*the*other*subgroups*together* are the other half, so you can't say that Anatolian is closer to any particular one. 

The non-Anatolian half of the family is increasingly being called "Nuclear IE".  A consensus is developing that Tocharian is half of Nuclear IE; all the other subgroups together are the other half, so you can't say that Tocharian is closer to any particular one. 

The internal structure of the remaining group, which we're beginning to call "Core IE", is more complex and less clear, but that has nothing to do with the position of Tocharian. 

As for where the (pre-)Tocharians came from, I'm sure David Anthony is right:  they moved east from Ukraine abruptly around 3500 BCE, when the speakers of Core IE were still in contact with each other and probably constituted a dialect continuum.  I'm attaching our 2015 paper, in which David outlines the evidence.

As for the evidence and reasoning behind the tree structure, below is the barest minimum presentation of conclusions now widespread among IEists.

Indo-Europeanists do not usually propose cladistic trees based only on comparative wordlists, for a simple reason: lexical items are not the most reliable indicator of linguistic ancestry.       On the one hand, it can be difficult to distinguish innovations (which demonstrate shared history) from retentions (which do not); on the other hand, it is too easy for closely related languages to borrow words from each other, and it is often difficult to detect such borrowings.

Instead we rely most on shared sound changes and shared innovations in inflectional morphology—two types of innovations which are learned in native language acquisition and which strongly resist modification later in an individual’s life. We use lexical items only as an aid to fleshing out a tree drawn from phonological and morphological characters. What is said below rests on these arguments.

Almost all Indo-Europeanists now agree that the Anatolian subgroup is one half of the family; all the other subgroups together, sometimes called “Nuclear IE”, are the other half, because they share inflectional innovations not found in Anatolian. Probably there was a single Proto-Nuclear IE language for some centuries. It follows that Anatolian is not more closely related to any particular subgroup of the family than to any other.

Within Nuclear IE, it is beginning to look like Tocharian is one half and all the other subgroups together are the other half, sometimes called “Core IE”.       There is plausible archaeological evidence that Tocharian split off from Core IE around 3300 BCE, because at that date a culture obviously derived from the Ukrainian steppes appears suddenly near the Altai mountains (see the discussion of Anthony and Ringe in the Annual Review of Linguistics, Vol. 1, 2015).       In this case too it follows that Tocharian is, so to speak, equidistant from the remaining Nuclear IE subgroups.      

What about the diversification of Core IE? That’s not so clear. The western branches, Celtic and Italic, share a few innovations not shared by the other subgroups; the same can be said of the eastern branches, namely Germanic, Balto-Slavic, Indo-Iranian, Greek, and Armenian. (Where Albanian fits is not discoverable, because by the time it began to be written down, all the diagnostic inflectional evidence had been lost.) But lexemes are shared in various patterns across the whole Core, which suggests that the ancestors of these branches were in contact for a long time, at least pairwise, and we might be looking at the wreckage of a dialect continuum for which no clean tree can be drawn.

Douglas Adams:

Of the two groups you mention, the unquestionable "winner" is Germanic.  Indeed, I would state that Germanic is the closest Indo-European relative of Tocharian, though the relationship is not terribly strong.  Second in line might be Slavic.  Which would put the "original" location of the pre-Tocharians somewhere on the "northern frontier" of the oldest reconstructible Indo-European world.  There are clear Iranian influences on Tocharian, but those influences are late (as explained in the Mallory festschrift article [VHM:  forthcoming]).  The earlier connections are fleshed out a bit in my contribution to the second edition of The Indo-European Languages (ed. Kapovic).  If pressed I would be strongly tempted to put Celtic dead last as a relative of Tocharian (despite all those plaids associated with the Tarim mummies).

With regard to Celtic, it's the absence of evidence which I think significant.  But proving a negative is notoriously difficult.  People may be able to adduce some real evidence that I'm not aware of.  Despite Hamp's tutelage I'm not as up on Celtica as I might be.  And my calculations take no account of Albanian and Armenian, again for lack of data.  But for those two it's different: it's a lack of data pure and simple, in Celtic's case it's a lack of confirming data.

I should emphasize that my notions of IE internal relationships as they pertain to Tocharian are quite impressionistic: I have not ever tried to quantify things.  I feel secure about the positive relationships with Germanic and Slavic but much less so about the "negative relationships."

As a dirt archeologist and historian of languages, peoples, and cultures, I still have a strong intuition that the Tocharians — though far older than the Celts — are somehow related to them, perhaps through some missing intermediaries.

10-20 years ago, Julie Wei did extensive research on Celtic-Sinitic comparative linguistics.  For the Celtic side, she relied heavily on the best Welsh lexicographical tools.  In one of the volumes on the Yih (Changes) that Denis Mair and I are nearing completion of, I will provide information on the methods and results of Julie Wei.  For now, I will just mention two of my favorite tools for studying old Celtic:

(GPC) Geiriadur Prifysgol Cymru (A Dictionary of the Welsh Language)
Y geiriadur hanesyddol safonol Cymraeg (The standard historical Welsh dictionary)

Etymological Dictionary of Proto-Celtic by Ranko Matasovic'.  Leiden:  Brill, 2009. 


Note from Julie Wei (6/20/23):

Geiriadur Prifysgol Cymru: A Dictionary of the Welsh Language,
4 vols, First Edition, 1997-2002, University of Wales. (see Wiki.)

I used this first edition for my 2005 papers.

There is a second edition, consisting of Parts 1 to 12, 2003-2013
(from google "A Dictionary of the Welsh language").


Selected readings on Tocharian

The language, the people, and their history

Archeology and language

The origin of the Tocharians and their relationship to the Yuezhi (月氏) have been debated for more than a century, since the discovery of the Tocharian language. This debate has led to progress on both the scope and depth of our knowledge about the origin of the Indo-European language family and of the Indo-Europeans. Archaeological evidence supporting these theories, however, has until now sadly been lacking

There are many other Language Log posts that mention Tocharian and the Tocharians.  Readers are encouraged to search for them in various contexts.

Selected readings on PIE

Two by Hamp

  • Eric P. Hamp, with annotations and comments by Douglas Q. Adams.  "The Expansion of the Indo-European Languages: An Indo-Europeanist’s Evolving View".  Sino-Platonic Papers, 239 (August, 2013), 1-14.
  • Hamp, E. P. (1998). “Whose were the Tocharians?: Linguistic subgrouping and diagnostic idiosyncrasy,” in The Bronze Age and Early Iron Age Peoples of Eastern Central Asia, ed. V. H. Mair, 1: 307–346.  Washington and Philadelphia: Institute for the Study of Man and the University of Pennsylvania Museum.


  1. David Marjanović said,

    August 20, 2023 @ 10:40 am

    Very interesting in the Tocharian context: this open-access paper on five thousand years of genetic history of Xinjiang. Afanasyevo/Tocharian and Andronovo/(Indo-)Iranian ancestry can be told apart by the lack of Early European Farmer ancestry within the former.

    Among the non-lexical arguments for the majority view that Tocharian is the sister-group of what Ringe calls Core IE* is that, in the latter, the single largest group of verbs are the "simple thematic verbs" every introduction into comparative IEistics will show you: the root is in *e-grade and stressed, then follows a lone vowel *e or *o, then the person+number endings, no zero-grades anywhere – so the whole phenomenon must be dated after the period when most unstressed vowels were automatically deleted. In Tocharian, "simple thematic verbs" are exceedingly rare (and in Anatolian they're altogether absent). There are more arguments, but I'll have to look them up properly.

    * Personally, I think Ringe's IE, Nuclear IE and Core IE should be called Indo-Anatolian, Indo-Tocharian and Indo-Actually-European. That would be much clearer than trying to make non-synonyms out of "nucleus" and "core".

  2. Andreas Johansson said,

    August 20, 2023 @ 2:14 pm

    I think we're basically stuck with "Indo-European" as including Anatolian, so how about Indo-European, Indo-Tocharian, and Crown Indo-European?

  3. David Marjanović said,

    August 20, 2023 @ 3:10 pm

    "Indo-Anatolian", with quotation marks, does give 4220 ghits.

  4. Andy said,

    August 20, 2023 @ 5:03 pm

    So I'm not that up to date with the latest developments -is this grouping of 'Nuclear IE' vs 'Anatolian' ('Indo-Anatolian' per DM's comment, above) the same as Proto-Indo-Hittite, or is there an implied difference in chronology or organisation?

  5. gds555 said,

    August 20, 2023 @ 5:07 pm

    It's a shame that the great Indo-Europeanist Franz Bopp (1791-1867) didn't live long enough to take a leading role in early Nuclear IE studies and in the partially consequent explosion of laryngeal theory. He could've been the field's Boppenheimer!

  6. gds555 said,

    August 20, 2023 @ 5:32 pm

    Oh, wait a minute–I was so enthusiastic about the potential pun that I rushed my writing and mixed up the classifications (which I'd never heard of until about an hour ago), and wrote as if the Anatolian languages were part of Nuclear IE, which they precisely aren't. My post should've read something like "Given that Franz Bopp (1791-1869) was a major pioneer in the study of what's now being called Nuclear IE, lighting the fuse for the explosion of Indo-European studies that followed, he could perhaps be called the field's Boppenheimer". What a way to ruin a joke. And to think that decades ago I got an A in one of Jochem Schindler's (1944-1994) classes, and now I disgrace myself like this. I'm sorry, Jochem. If any moderator wants to erase this sorry mess that my two posts constitute, please go right ahead.

  7. gds555 said,

    August 20, 2023 @ 5:35 pm

    And that's (1791-1867) for Bopp's dates in my second post.

  8. AntC said,

    August 21, 2023 @ 12:29 am

    When Don Ringe says

    The Anatolian subgroup is clearly half of the IE family; *all*the*other*subgroups*together* are the other half …
    Tocharian is half of Nuclear IE; all the other subgroups together are the other half

    What sense of 'half' is this? It was spoken by half the (proto-)IE population at the time of the split? It contains an equal amount of linguistic innovation since the split as compared to the other 'half'?

    And later Doug Adams: I have not ever tried to quantify things.

    How do you 'weigh' or 'scale' languages? Even those known to be tightly related? Quantify what things?

    'Half' means merely equal in importance in the view of IE-ists? Importance for what?

    As a modern-day comparison: is English 'half' of something? Half of Germanic? Are Germanic/Celtic/Romance equal thirds of Western European IE?

    Sorry, but those comments conveyed nothing to me.

  9. Andreas Johansson said,

    August 21, 2023 @ 12:48 am


    When Ringe says that Anatolian is "half" the family, he means only that Anatolian alone makes up one of the first two branches the family split into. There's nothing actually quantitative about it.

  10. AntC said,

    August 21, 2023 @ 3:20 am

    one of the first two branches the family split into

    Thanks Andreas. So did (P)IE branches always split in two each time? Isn't that rather simplistic? Was there no multi-directional splitting, with different mixing of languages at different migration routes? Was there no back-migration with an admixed branch of IE coming back to previous homelands/mixing with those who'd never left?

    I ask because in the case of East Polynesian languages, there's no assumption that splits were binary each time. And indeed there was continued contact between the branches to/from Samoa for maybe a century after initial dispersal eastwards. So multiple languages evolved away from a common form in multiple directions simultaneously.

  11. Andreas Johansson said,

    August 21, 2023 @ 3:40 am


    I, obviously, do not know. But acc'd to Ringe's scenario, no, IE splits were not always simple binary ones. However, the two first splits, between Anatolian and Indo-Tocharian and between Tocharian and the rest, happens to be such, or as close to it that we can't tell the difference a few millennia later.

  12. David Marjanović said,

    August 21, 2023 @ 5:49 am

    (I'm about to look things up…)

    So I'm not that up to date with the latest developments -is this grouping of 'Nuclear IE' vs 'Anatolian' ('Indo-Anatolian' per DM's comment, above) the same as Proto-Indo-Hittite, or is there an implied difference in chronology or organisation?

    Yes and no. (Jein. Ni oui, ni non, bien au contraire !)

    In one sense it's the same as the original Indo-Hittite hypothesis. But most of the evidence that was used to support the latter hasn't held up, so the idea was largely abandoned, and when it came back, it was largely based on new evidence.

    is English 'half' of something?

    Yes, of Anglo-Frisian (as long as you include Scots in English).

  13. David Marjanović said,

    August 21, 2023 @ 11:40 am


    – "Simple thematic" verbs are the largest single group of verbs in IAE (Indo-Actually-European), very rare in Tocharian (and absent in Anatolian).
    – Those few that do exist in Tocharian stop being thematic in the optative, i.e., where IAE has *-o-jh₁-, Tocharian has just *-ih₁-. This would seem to make sense especially if the "simple thematic" verbs are reinterpreted subjunctives: an optative of a subjunctive must have seemed absurd or awkward for a while. (The reinterpretation of further subjunctives as "simple thematic" indicatives continued later, e.g. there's a bunch unique to Germanic, IIRC.)
    – It is not self-evident whether the comparative (as a category, and with its suffix *-is-/*-jes-/*-jos-) was completely lost in Anatolian and Tocharian or is an innovation of IAE. However, in the known cases of loss of the comparative (e.g. Romance or Bulgarian), a few relict forms are left behind, and none have been identified in Anatolian or Tocharian so far. The superlative is pretty clearly a set of parallel innovations within PIAE in any case.

    Arguably lexical:

    – Tocharian and Anatolian had two stems for interrogative pronouns, *m- and *kʷ-. *m- is lost in IAE (perhaps surviving in a few conjunctions, but not in interrogative pronouns).


    – PIA seems to have had a telic and an atelic verb for "drink" (as for many other things): *poh₃- and *h₁egʷʰ-. The latter is a verb meaning "drink" in Anatolian and Tocharian, but in IAE it is lost as such, found only in adjectives like Latin ēbrius "drunk" or Greek νήφων "sober" ( < *n-h₁gʷʰ-on-s or suchlike, *"one who is not drunk").
    – In IAE, *wiHrós is one of those patriarchal "man ~ warrior ~ hero" words. In Tocharian (A at least), it (wir) is an adjective meaning "young". The latter being older makes more sense.
    – *jebʰ- is the root of the verb "enter" in Tocharian. In IAE, *jebʰ- is only used in a very metaphorical sense connected to, let's say, fertility.

    Further reading, including the sources for most of the above and long discussions of less clear cases: Kim 2018, Peyrot 2022.

  14. DCBob said,

    August 21, 2023 @ 12:31 pm

    Dr. Ringe's observation that "we might be looking at the wreckage of a dialect continuum for which no clean tree can be drawn" seems very consistent with Andrew Garrett's suggestion that the IE sub-groups may have resulted from gradual convergence of neighboring dialects combined with the 'pruning' of intermediate dialects, so that the tree-like appearance of the family is somewhat artificial.

  15. David Marjanović said,

    August 21, 2023 @ 3:11 pm

    Garrett overdid it, though. He found that several of the supposedly diagnostic innovations of Greek actually had a different distribution or history, and jumped to the conclusion that Greek isn't a clade. There are diagnostic innovations that he overlooked.

  16. Chris Button said,

    August 22, 2023 @ 11:07 am

    As a dirt archeologist and historian of languages, peoples, and cultures, I still have a strong intuition that the Tocharians — though far older than the Celts — are somehow related to them, perhaps through some missing intermediaries.

    I’ve always felt studies of textiles can really help with linguistic taxonomy.

    When I was doing my studies on the phonology and morphology of some Northern Kuki-Chin languages on the Burma side of the India border, I learned of some of the nuances and details in how textile traditions can distinguish closely related groups. I would have loved to have partnered with a Kuki-Chin textile expert to see how their textile comparisons could complement (or even challenge) my linguistic comparisons.

  17. David Marjanović said,

    September 5, 2023 @ 4:08 pm

    This would seem to make sense especially if the "simple thematic" verbs are reinterpreted subjunctives

    …which only seems to work for a small portion of them. This paper argues that "simple thematic" verbs as a group began as verbs derived from adjectives, and a few reinterpreted subjunctives (as well as a few reinterpreted other things) joined later. However, it points out that the subjunctive is absent in Anatolian and Tocharian, although without completely excluding the possibility that it was originally present and then lost in these two branches.

RSS feed for comments on this post