The geographical, archeological, genetic, and linguistic origins of Tocharian

« previous post | next post »

[The following is a guest post by Douglas Adams.]

Key words:  Eastern Central Asia (ECA); Tarim Basin; Dzungarian Basin; Bactria–Margiana Archaeological Complex (BMAC) Anatolian; Proto-Indo-European; “standard average Indo-European” (“SAIE”); Hittite; Iranian; Sogdian; Khotanese; Bactrian; Avestan; Saka; Indo-Aryan; Mitanni; Assyrian; Indo-Hittite; Fertile Crescent: Yamnaya; Sintashta; Andronovo; Afanasievo; Minusinsk Basin; Qäwrighul; genetics; Yanqi Basin; Ili Valley; Yuezhi; Xiongnu; Turfan Basin; stockbreeding; barley cultivation; millet; irrigation technology; donkey; camel; brick; arrow; irrigation technology; Russian; Kazakhstan; Indo-Iranian; Sanskrit; Massagetae

———-

Below is a host of questions, implied questions, and questionable statements. I’m trying to get my head around the prehistoric interrelations of pre-Proto-Iranians and pre-Proto-Tocharians based on different “age-levels” of linguistic borrowing and match them with some plausible geographical / archaeological contexts. There are some conundrums here: (1) how did early borrowings from the Bactria–Margiana Archaeological Complex (BMAC) folks get so quickly, by so round about a way, into Tocharian, and (2) why does Tocharian B have an irrigation vocabulary so reminiscent of Central Iranian languages (Sogdian/Avestan; not Saka), borrowed (on phonological grounds) a thousand years (at least) after Tocharians were already knowledgeable about irrigation.

We start out with some “consensus dates”:

Yamnaya, between Dnieper and Ural

                3300 – 2600 BC

                Proto-(non-Anatolian-)Indo-European [“Restindoeuropäisch”]

Sintashta

                2200 – 1800 BC

                Proto-Indo-Iranians

Andronovo cultural complex

                2000 – 900 BC  

Proto-Indo-Aryans to the south (arriving on the edge of the Fertile Crescent [Mitanni] by 1400 BC)

Proto-Iranians to the north (arriving on the edge of the Fertile Crescent [Assyrian records] by 1100 BC)

Anatolians (i.e., Hittites particularly) first recorded in history in early years of second millennium BC (Assyrian commercial records of Assyrian trading colonies in eastern Anatolia). By 1600 BC we have the formation of the Hittite Empire. Consensus opinion that Proto-Anatolian separated from the rest of the Proto-Indo-European speech community about 3500 BC, a date that antedates the formation of the Yamnaya culture. Linguistically Anatolian shows more than enough “special features” (both retained archaisms and innovations) to support some form of the “Indo-Hittite” hypothesis.

Afanasievo, Minusinsk Basin in far-off Siberia

                3300 – 2500 BC

Universally presumed to be a cultural sibling of some sort with the Yamnaya culture and universally presumed to speak a language related to that of the Yamnaya culture. The latter is a reasonable presumption, but it is a presumption: there is NO linguistic evidence one way or the other. It is particularly important to note that the dating makes whatever language spoken by the Afanasievo people not a descendant of the Proto-Indo-European of the Yamnaya people. If it was related at all, it was rather a sister of the Proto-Indo-European in the same way that Anatolian was a sister of the Proto-Indo-European rather than a descendant. Any descendant of the Afanasievo language, if Indo-European at all, would be as different from the “standard average Indo-European” (“SAIE”) languages descended from the Yamnaya language as Anatolian is different from “SAIE.” Since Tocharian is clearly “SAIE,” it can’t be a descendant of the Afanasievo language (again, presuming the latter was related to Proto-Indo-European at all).

So also, on genetic evidence, Hollard, Clémence; et al. (2018), "New genetic evidence of affinities and discontinuities between bronze age Siberian populations," Am J Phys Anthropol. 167 (1): 97–107, discount any relationship between the people of the Afanasievo culture and the later inhabitants of the Tarim Basin. They say, “[o]ur results support the hypothesis of a genetic link between Afanasievo and Yamnaya (in western Eurasia), as suggested by previous studies of other markers. However, we found no Y-chromosome lineage evidence of a possible Afanasievo migration to the Tarim Basin.”

The ancestors of the Tocharians (the “pre-Tocharians”), generally (though not universally) assumed to be the people of the Qäwrighul culture (2000 – 1500 BC), of the lower Tarim and Kongqi rivers, and their antecedents, who were growing grain in the Yanqi Basin by 2200 BC, entered the Tarim Basin from the northeast and not from the western portion of the Tarim Basin, which remained empty until the middle part of the first millennium BC when the Sakas entered the basin.

So the pre-Tocharians must have have come immediately from the northeast, ultimately from the (north-)west, but not from the Afanasievo culture area. So where exactly did they enter Xinjiang? It seems that there are three possibilities: (1) through the Ili valley, the same route the Yuezhi, arguably their descendants, later took going west when driven out of their earlier western Gansu homes by the Xiongnu in 176 BC; (2) through the ever-windy Dzungarian Gate,[1] the southwest entrance into the Dzungarian Basin (note the Andronovo-type settlement in Adunqiaolu in Wenquan County [exact dates?], Jia, P., Betts, A., Cong, D., Jia, X., & Dupuy, P. (2017). “Adunqiaolu: New evidence for the Andronovo in Xinjiang, China.” Antiquity, 91 (357), 621-639. doi:10.15184/aqy.2017.67); (3) through the passes from Kazakhstan into the northwest corner of the Dzungarian Basin. All three of these routes, if followed long enough, would lead to the Turfan Basin, and thence to the Yanqi and ultimately the Tarim Basins. All three of these entrance ways have their western portals in the steppes of eastern Kazakhstan.

At the latest, the pre-Tocharians would have had to have left Kazakhstan in the last few centuries of the third millennium BC, at a time roughly contemporary with the early part of the Sintashta culture.

Both linguistics and archaeology say these pre-Tocharians were primarily stockbreeders, but they also practiced some cereal agriculture, witnessed by the Proto-Indo-European origins of yap ‘barley’ (< *yebhom, by manner dissimilation from *yewom), proksa [pl. (tant.?)] ‘grain’ (cf. Slavic, e.g., Russian próso ‘millet’ [more, Adams, 2013:454]), tāno ‘grain’ (< *dhoh3nah2-), and the Tocharian A word for ‘plow,’ āre (< *h2ar-).

Eastern Kazakhstan, indeed all of Kazakhstan, would seem to be archaeologically a terra incognita in the third millennium BC, i. e., before the rise of the Andronovo cultural complex.

Either rudimentary irrigation agriculture had reached eastern Kazakhstan from the BMAC by the middle of the third millennium BC and was carried thence to inner Xinjiang, or it was invented de novo by the late third millennium inhabitants of the Turfan and Yanqi Basins.

Earliest linguistic position localizable was near Proto-Indo-Iranian (Sintashta), from whence it borrowed kercapo ‘donkey’ (compare Sanskrit gardabhá- ‘donkey’ < pre-Indo-Iranian *gordebhó-) before the falling together of PIE *e, *a, and *o into *a in Proto-Indo-Iranian (ca. 2500 BC?).

Later, but still early (ca. 1500 BC?), when pre-Tocharians are in the eastern Tarim Basin (as the Qäwrighul culture?), there are exclusively Iranian loanwords (i.e., not Indo-Aryan), but showing presumed BMAC influence (iścem ‘tile/brick’ [< BMAC?], wastre ‘camel’ {< BMAC?], etswe ‘donkey,’ tsain ‘arrow’). Is the vector here the nomadic Iranians of the Ili valley, one of whose cemeteries was recently discovered? In the preservation of Proto-Iranian *ts/dz as affricates, this variety of early Iranian is more archaic than any other known variety. That may be simply a matter of its age. In the preservation of the cluster *-tsw-, however, it appears to be the ancestor of no surviving variety of Iranian. The variety of Iranian spoken by the Massagetae?

Still later (ca. 1000 BC?) is the irrigation technology vocabulary shown by Tocharian B (murye, newiya). Are these words from the in-coming Saka in the first half of the last millennium BC? Tocharians (future Tocharian B’s) moving up the Tarim meet Iranians near Tumshuq and borrow Iranian irrigation terminology? [This is the big conundrum. Why did Tocharian B need new irrigation technology? Why was it borrowed from a central Iranian variety like Sogdian, rather than from Saka?]

And finally (500 BC – 500 AD) are borrowings during the “Silk Road Era” from “co-territorial” Khotanese, Sogdian, and Bactrian.

—–

[1] “The Dzungarian Gate has been noted in modern history as the most convenient pass for horseback riders between the western Eurasian steppe and lands further east, and for its fierce and almost constant winds. The area has also become known for its gold deposits and for producing prodigious numbers of dinosaur fossils, especially Protoceratops. Given that Herodotus relates a story of a traveler to the East who visited a land where griffins guard gold and east of which live the Hyperboreans, modern scholars have theorized that the Dzungarian Gate may be the real-world location of the home of Boreas, the North Wind of Greek Legend” (Wikipedia, s.v. Dzungarian Gate, accessed July 4th, 2020). Did the Hyperboreans speak Tocharian?

 

Selected readings

The language, the people, and their history

Archeology and language

The origin of the Tocharians and their relationship to the Yuezhi (月氏) have been debated for more than a century, since the discovery of the Tocharian language. This debate has led to progress on both the scope and depth of our knowledge about the origin of the Indo-European language family and of the Indo-Europeans. Archaeological evidence supporting these theories, however, has until now sadly been lacking



15 Comments

  1. Victor Mair said,

    July 14, 2020 @ 12:43 pm

    I have always felt that the earliest wave of Tarim mummy people were pre-Proto-, Proto-, or early post-Proto-Tocharian speakers. I spelt those ideas out various places, most notably in the "Introduction" ("Priorities") and "Conclusion" (“Die Sprachamöbe: An archeolinguistic parable") in The Bronze Age and Early Iron Age Peoples of Eastern Central Asia, 2 vols. (Washington, D.C.: The Institute for the Study of Man; Philadelphia: The University of Pennsylvania Museum, 1998). Not only that, the latest archeological findings show that the earliest movement of Bronze Age peoples into the Tarim Basin came from the northeast, around the eastern edges of the Tängri Tagh / Tianshan / Heavenly Mountains, and surveys done within the last ten years are so show that there are hundreds of Bronze Age sites precisely in the area around Lop Nor.

  2. Tom Dawkes said,

    July 14, 2020 @ 2:31 pm

    There is another article by Michaël Peyrot in Indo-European Linguistics 7 (2019) 72–121 [brill.com/ieul]
    “The deviant typological profile of the Tocharian branch of Indo-European may be due to Uralic substrate influence”.

  3. Chris Button said,

    July 14, 2020 @ 3:30 pm

    The word tsain ‘arrow’ has been convincingly posited as the source of 箭 in Chinese

  4. Chris Button said,

    July 14, 2020 @ 5:53 pm

    The word tsain ‘arrow’ has been convincingly posited as the source of 箭 in Chinese

    I thought we'd discussed that on LLog before:

    https://languagelog.ldc.upenn.edu/nll/?p=24918

  5. Doug Hitch said,

    July 14, 2020 @ 8:32 pm

    That's a nice summary.

    Things could travel enormous distances in prehistoric times. Long before Magellan the banana went from New Guinea to Africa, and the sweet potato spread from South America throughout Polynesia. Lapis lazuli, only mined in Afghanistan, is found in 4th millennium BC Egyptian jewelry. [Possibly not the same kind of thing, but the BMAC word for 'brick' is found in Modern Khmer ឥដ្ឋ, pronounced ʔət, but if you use Indic transliteration, īṭṭha, you see the Pali intermediary.]

    The exploding Yamnaya population probably took their ox carts in all directions and filled every niche they could, plausibly up to the Yellow River in the east. (In the west they made it to Newfoundland around 1K AD.) I suspect they would have used every imaginable point of entry into the Tarim Basin.

    I wonder if the Iranian irrigation terms in Kuchean are used in legal contexts and reflect the period of Kushan administration. Property limits were often defined by topographic features like canals. Many legal terms in current English come from the Norman French administrative period.

    When thinking about ancient settlement patterns it is important to know what the climate was like at the time. Bronze Age Lop Nor may have been a huge lake, surrounded by forest, and very amenable to settlement. There may have been a large Yamnaya descendant population.

  6. Barbara Phillips Long said,

    July 14, 2020 @ 11:10 pm

    Snicker. Sorry, but I spent a few seconds trying to figure out who “Magellan the banana” was.

    I have often wondered if some things in prehistory, such as taming horses to ride, came about because adolescents wanted to prove themselves. (I picture some elder grousing about how “kids today“ don’t want to keep the birds away from the grain fields, they just want to tease those animals.) Plus, I imagine some movements of small groups being due to either the desire to leave the home group or eviction from the home group, and perhaps effecting linguistic changes.

  7. David Marjanović said,

    July 15, 2020 @ 10:50 am

    There is another article by Michaël Peyrot in Indo-European Linguistics 7 (2019) 72–121 [brill.com/ieul]
    “The deviant typological profile of the Tocharian branch of Indo-European may be due to Uralic substrate influence”.

    The paper is here, and its abstract reads:

    Tocharian agglutinative case inflexion as well as its single series of voiceless stops, thetwo most striking typological deviations from Proto-Indo-European, can be explained through influence from Uralic. A number of other typological features of Tocharian may likewise be interpreted as due to contact with a Uralic language. The supposedcontacts are likely to be associated with the Afanas’evo Culture of South Siberia. This Indo-European culture probably represents an intermediate phase in the movement of speakers of early Tocharian from the Proto-Indo-European homeland in the Eastern European steppe to the Tarim Basin in Northwest China. At the same time, the Proto-Samoyedic homeland must have been in or close to the Afanas’evo area. A close match between the Pre-Proto-Tocharian and Pre-Proto-Samoyedic vowel systems is a strong indication that the Uralic contact language was an early form of Samoyedic.

    While some of what it says about Uralic in general and Samoyedic in particular is outdated, the paper still makes a good case that a certain pre-Tocharian stage must have been spoken in southern Siberia. Some of the necessary corrections would in fact bolster that conclusion further.

    ~~~~~~~~~~~~~~~

    Minor points from the OP:

    Anatolians (i.e., Hittites particularly) first recorded in history in early years of second millennium BC (Assyrian commercial records of Assyrian trading colonies in eastern Anatolia).

    Unmistakably Anatolian personal names show up in the archives of Ebla in large numbers as early as 2500 BC. That discovery was hidden in this supplement to this open-access blockbuster archeogenetics paper that came out in Science in 2018.

    Any descendant of the Afanasievo language, if Indo-European at all, would be as different from the “standard average Indo-European” (“SAIE”) languages descended from the Yamnaya language as Anatolian is different from “SAIE.”

    Why "as different" and not much closer?

    Since Tocharian is clearly “SAIE,” it can’t be a descendant of the Afanasievo language (again, presuming the latter was related to Proto-Indo-European at all).

    Tocharian is clearly much closer to SAIE than Anatolian is; but, almost as clearly, it's not inside. For instance, the "simple thematic" verbs, which are so characteristic of PIE but absent from Anatolian, are a small, incipient category in Tocharian; less impressively, the words for "die" aren't *mer- (instead Tocharian A uses *wel- like Anatolian), and *yebʰ- is a perfectly innocent verb meaning "enter".

    So, while the Tocharians are not (Hollard et al. 2018) descendants of the Afanasievo people in the direct male line, at least not in the statistical majority, their languages may well have come from there, and have a common ancestor with SAIE in a language spoken in the Khvalynsk culture.

  8. Francesco Brighenti said,

    July 17, 2020 @ 3:18 am

    David Marjanović wrote:

    “Unmistakably Anatolian personal names show up in the archives of Ebla in large numbers as early as 2500 BC. That discovery was hidden in this supplement to this open-access blockbuster archeogenetics paper that came out in Science in 2018.”

    “Unmistakably” Anatolian?

    In the paper entitled “Linguistic supplement to Damgaard et al. 2018: Early Indo-European Languages, Anatolian, Tocharian and Indo-Iranian,” available at

    https://zenodo.org/record/1240524#.WvVrcfSwfcu ,

    historical linguists Guus Kroonen, Gojko Barjamovic and Michaël Peyrot claim that in the palatial archives of the city of Ebla in Syria, dated to the 25-24th centuries BCE, “a small group of ca. twenty names connected to Armi [a state probably located on the highlands of southeastern Anatolia – FB] build on what appear to be well-known Anatolian roots and endings, such as -(w)anda/u, -(w)aššu, -tala, and -ili/u” (p. 6).

    In the press release on the work from the University of Copenaghen at http://tinyurl.com/yax9zs2x , Barjamovic says in an interview:

    “The Indo-European languages are usually said to emerge in Anatolia in the 2nd millennium BCE. However, we use evidence from the palatial archives of the ancient city of Ebla in Syria to argue that Indo-European was already spoken in modern-day Turkey in the 25th century BCE. This means that the speakers of these language must have arrived there prior to any Yamnaya expansions.”

    Thus, based on this ancient non-Semitic onomastic evidence, the authors of the linguistic supplement to Damgaard et al. 2018 claim it may be “argued”, or it “appears” that Anatolian Indo-European languages were spoken in modern-day Turkey in the 25th century BCE – that is, at least 500 years before the current “majority consensus” dating of the coming of Anatolian speakers to modern-day Turkey.

    It must be pointed out that Profs. Alfonso Archi and Marco Bonechi, the two Italian ANE scholars cited in Kroonen, Barjamovic & Peyrot’s paper to back their novel claim, just state that some of the roots and endings contained in the aforesaid personal names recorded in texts from the Ebla palatial archives “remind” of Anatolian suffixes (see Archi 2011, in English, at https://tinyurl.com/y7n7k7cn , p. 465), or that the names themselves “look” (It. “sembrano”) Anatolian (see Bonechi 1990, in Italian, at https://tinyurl.com/yd3mvocq , p. 34). The claim that these names “*clearly* [emphasis mine – FB] fall within the Anatolian Indo-European family” is obviously an inference of Kroonen, Barjamovic & Peyrot.

    I think much more research is needed to support such a grandiose claim, i.e. that Anatolian speakers were already present in modern-day Turkey in the first half of the 3rd millennium BCE, prior to any Yamnaya expansions. As far as I know, no such research, comparable to the one made to establish, for instance, the Indo-Aryan origin of tens and tens of personal names recorded in cuneiform texts from Bronze Age Syria/Palestine and northern Iraq (Hurrian/Mittani onomastics, etc.), is in sight at present. *If* future research firmly establishes the onomastics cited by Kroonen, Barjamovic & Peyrot is really Anatolian, then the Indo-Hittite hypothesis (an alternate name for Indo-European, called so by some historical linguists who believe that Anatolian languages broke off well before there were any further divisions in Proto-Indo-European) will be vindicated and the following claim by Kroonen, Barjamovic & Peyrot will prove true:

    “[S]ince the onomastic evidence from Armi is contemporaneous with the Yamnaya culture (3000-2400 BCE), a scenario in which the Anatolian Indo-European language was linguistically derived from Indo-European speakers originating in this culture can be rejected. This important result offers new support for the Indo-Hittite hypothesis (see above) and strengthens the case for an Indo-Hittite speaking ancestral population from which both Proto-Anatolian and residual Proto-Indo-European split off no later than the 4th millennium BCE” (p. 7).

  9. Chris Button said,

    July 17, 2020 @ 6:22 am

    @ David Marjanović

    You brought up *yebʰ- being "a perfectly innocent verb meaning 'enter'," elsewhere on LLog. A commenter pointed out that a semantic connection between "enter" and "intercourse" occurs elsewhere (e.g., Chinese). Why should such a standard semantic shift be cause for any discussion around the relatedness between languages?

  10. David Marjanović said,

    July 17, 2020 @ 9:15 am

    The claim that these names “*clearly* [emphasis mine – FB] fall within the Anatolian Indo-European family” is obviously an inference of Kroonen, Barjamovic & Peyrot.

    Kroonen and Peyrot are IEists who are clearly qualified to make that inference. Barjamović is an archeologist.

    I think much more research is needed to support such a grandiose claim, i.e. that Anatolian speakers were already present in modern-day Turkey in the first half of the 3rd millennium BCE, prior to any Yamnaya expansions.

    Why is that a grandiose claim? What little genetic data we have from people buried by Hittites is conspicuous in its lack of "steppe ancestry". Let me present a few quotes from Damgaard et al. 2018:

    "In Anatolia, Bronze Age samples, including from Hittite speaking settlements associated with the first written evidence of IE languages, show genetic continuity with preceding Anatolian Copper Age (CA) samples and have substantial Caucasian hunter-gatherer (CHG)–related ancestry but no evidence of direct steppe admixture."

    "Finally, the lack of steppe ancestry in samples from Anatolia indicates that the spread of the earliest branch of IE languages into that region was not associated with a major population migration from the steppe."

    "We find no evidence of steppe ancestry in Bronze Age Anatolia from when Indo-European languages are attested there. Thus, in contrast to Europe, Early Bronze Age Yamnaya-related migrations had limited direct genetic impact in Asia."

    …and then there's a whole section titled "Lack of steppe genetic impact in Anatolians". As mentioned above, the paper is in open access; just read it.

    then the Indo-Hittite hypothesis […] will be vindicated

    It's already pretty much obvious on purely linguistic grounds. Like, I disagree with half of this list, but the other half is still enough to make a strong argument.

    Why should such a standard semantic shift be cause for any discussion around the relatedness between languages?

    Parsimony: all else being equal, it's better to assume that the semantic shift happened only once.

    I wouldn't bring it up if it were the whole extent of the evidence, but it isn't. The thematic verbs are much stronger evidence. *yebʰ- is just noteworthy for fitting that picture.

  11. Chris Button said,

    July 17, 2020 @ 12:04 pm

    Parsimony: all else being equal, it's better to assume that the semantic shift happened only once.

    I don't understand why. Comparative semantics should be treated like comparative phonology. Evidence that a supposed shift is valid comes from similar instances elsewhere. Otherwise, and particularly in the case of semantics, it risks being mere speculation.

  12. David Marjanović said,

    July 18, 2020 @ 7:15 am

    Maximum parsimony is basic science theory. What else do you suggest, maximum munificence?

    But, as I said, feel free to ignore the semantic shifts and go with the thematic verbs instead…

  13. Chris Button said,

    July 18, 2020 @ 8:25 am

    The problem is that you are misapplying it. Chinese and English share many similar semantic shifts. Would it be parsimonious to assume that Chinese and English are therefore related languages and the shifts only happened once? That also assumes that a semantic shift is a single discrete one-off process.

  14. David Marjanović said,

    July 19, 2020 @ 6:27 am

    Chinese and English share many similar semantic shifts.

    …but not, obviously, applied to cognate words.

  15. Chris Button said,

    July 19, 2020 @ 6:49 am

    Cognate or not, the point remains the same. Words with similar meanings, cognate or otherwise, unsurprisingly show similar semantic shifts across the world's languages. One isolated case here is not a valid basis for an argument.

RSS feed for comments on this post