Tocharian, Turkic, and Old Sinitic "ten thousand"

Serious problem here.

Clauson, An Etymological Dictionary of Pre-Thirteenth-Century Turkish, p. 507b:

F tümen properly ‘ten thousand’, but often used for ‘an indefinitely large number’; immediately borrowed from Tokharian, where the forms are A tmān; B tmane, tumane, but Prof. Pulleyblank has told me orally that he thinks this word may have been borrowed in its turn fr. a Proto-Chinese form *tman, or the like, of wan ‘ten thousand’ (Giles 12,486).

Source (pdf)

[VHM:  the "F" at the beginning of the entry means "Foreign loanword"]

Many years ago, I studied Tocharian with Donald Ringe here at Penn, and once read through the whole of Clauson looking at all the "F" words, so I have been aware of this alleged connection regarding 萬 for about a quarter of a century.

This gives rise to two questions:

  1. Do Toch. A tmān; B tmane, tumane have an etymology within IE?
  2. Is *tman "or the like" a legitimate reconstruction for Sinitic 萬 (MSM wàn)?  Cf. Schuessler mans, Baxter-Sagart /*[n]-s/, Zhengzhang /*mlans/.

Question 1 is for IE specialists to work on, question 2 is for Sinologists to work on.

The Chinese side of things is very tricky.  I follow Schuessler, not Baxter and Sagart, in not seeing any evidence for that initial "t".  But let's see what the experts say.

South Coblin:

If P said this to Clauson face to face, that must have been back in Pleistocene times. God knows what he would say today if he were still alive.


  1. “Scorpion”: See (OC *m̥ʰraːds);
  2. “Religious dance; sorcery”: Perhaps from Proto-Sino-Tibetan *s-man (“medicine”). Compare Tibetan སྨན (sman, “medicine; she-demons worshipped by common folk”), Burmese မန်း (man:, “utter mystic words to heal or ward off evil”);
  3. Myriad” (10000): Schuessler (2007) considers the etymology of this sense Sino-Tibetan, and compares it with Tibetan འབུམ ('bum, “hundred thousand; complete; entire; multifarious”). Similar words are found in branches of Altaic and Tocharian; here they are treated as very old loanwords from Chinese, per Pulleyblank (apud Clauson, 1972), Beckwith (2009), Adams (2013) and Tremblay (2005).

It seems to me that Schuessler's Tibetan འབུམ ('bum, “hundred thousand; complete; entire; multifarious”) is much more convincing and less speculative.  Ever the cautious historical linguist that he is, Schuessler himself says (in a personal communication) that "The vowels do not agree completely, though. In short, the etymology of wàn is uncertain."

Alexander Lubotsky and Sergei Starostin:

Toch. A tmäm[underdot], Toch. B t(u)mäne 'ten thousand, a myriad' < PToch. *t(ə)mäne :: Proto−Turkic *Tümen 'ten thousand; very many' (OUygh. tümen, Turkm. tümen) < Proto-Altaic *ci[c hacek i breve under] ùmi 'a large number' (e.g. Proto−Korean *c[c hacek]ímɨín 'thousand').

Tocharian may have borrowed this Turkic word through a Middle Iranian intermediary (cf. Modern Persian tumän `ten thousand'), which would better account for the vocalism.

Source:  from p. 4 (261), no. 7 of "Turkic and Chinese loan words in Tocharian", in Brigitte L.M. Bauer and Georges-Jean Pinault, eds., Language in time and space: A Festschrift for Werner Winter on the occasion of his 80th birthday (Berlin / New York:  De Gruyter, 2003), pp. 257-269.

Donald Ringe:

No IE etymology for the Tocharian words.  We can't even securely reconstruct 'thousand' all the way back:  Greek and Indo-Iranian clearly share a word, and Latin *might* share it (there are problems); Germanic and Balto-Slavic share a word, but it's a different word; that's all.

Michael Weiss:

I don't think there is a good IE etymology for the Tocharian forms.   Bailey said that it was a loan from from a Middle Iranian source going back to an old Iranian *tu-ma:na-  'great-measure', but the u in TB is rare (2x vs. 14 for tm-) and late and probably an epenthetic vowel (so Winter in IE Numerals). tm- is not otherwise attested as a word-initial onset in Tocharian B.  The word is all over Central Asia: Pers. tuma:n '10 rials' is borrowed from Turkish. Cf. Uigh. tümäne Tungus tuman, Mongol tuma:n. Further Bailey's etymology is just bad for a lot of reasons.

So without a lot of investigation I'd say it's not implausible that the Tocharian words could have been borrowed from Chinese and  passed it on to Turkic or that it could have been borrowed into both independently.  The integration into the Tocharian -e class is not surprising since 100 has the same inflection.

Hannes Fellner:

I do think that Old Chinese should be reconstructed as *[n]-s. And I believe that Proto-Tocharian borrowed this word from Old Chinese. The Tocharian B forms with –u– are very rare and cannot reflect something original. So, there is no relation to words for 'thousand' in Balto-Slavic and Germanic (and also very likely none to an Iranian form akin to Old Persian tauman- 'strength').

Doug Adams:

My first response is, no, the Tch words do not have a great PIE etymology.  Some, myself included at times, have thought the t(u)- part might have some connection with the thou- of English thousand.  But that seems pretty unlikely now as any etymological equivalence with Germanic thousand (and its Baltic cognates) should show up with Tch **tuu- (long vowel) or **to-.  By any account –maan/mane does not match Germanic –and (or its Baltic equivalent), which, disguised, is the word for '100' (*'great 100' or the like).  And, of course, thousand is 1,000, not 10,000.

If the Chinese etymology is supportable (and that's a thicket mere mortals, let alone, angels, fear to tread), then it would look very attractive to me.  Of course if *tman or the like has no Sino-Tibetan support outside Chinese itself, we might have to look westward again.

Ron Kim:

Tocharian '10,000' not only does not have an IE etymology, but its shape makes a native origin unlikely.  The same goes for Old Russian тьма, universally considered a borrowing from Mongolian (cf. тумен for an army unit, presumably in origin of 10,000 men).

By sheer coincidence, Prof. Baxter dropped by ECIEC last June when it was at U. of Michigan and was chatting with me and Tao Pan (a student of Tocharian and Buddhist philology now at LMU München) about Old Chinese '10,000'.  As I recall, he was arguing that the Vietnamese evidence in particular suggests specifically an initial /t-/ for the Old Chinese form.  [VHM:  I'd love to know what that Vietnamese evidence is.]   That reminds me: you might be interested in a recent paper by Michaël Peyrot and Kristin Meier that argues for the traditional derivation of Old Chinese *mit ["honey"] from (pre-)Proto-Tocharian, based in part on the Sino-Vietnamese evidence.

Georges Pinault:

I would state that Clauson's notice is obsolete. Toch. B tumâne, A tmân are most probably borrowed from Chinese. The same word has been also independently borrowed in Jurchen, Old Turkic, etc. In any case, the Old Turkic word cannot be directly borrowed from Tocharian, because of the vocalism; I may add that Toch. B tumâne, A tmân does not have any meaningful IE etymology. The final -e of Toch. B is analogical from kante 'hundred' and yaltse 'thousand'.

Gerd Carling:

This is complex. As far as I understand, the explanation by Adams (2013:318) apud Winter 1991: the Tocharian word is ultimately borrowed from Middle Iranian, is possibly problematic due to the fact that it is attested in Modern Iranian only, indicating that the Iranian words may be borrowed from Turkic, which in turn is likely borrowed from Chinese (or, alternatively Tocharian, which possibly borrowed from Chinese).

I think the safest is to say that this is an early migration word, which possibly has Chinese as its ultimate origin. There is no reliable IE etymology for the Tocharian word.

Tsu-Lin Mei:

There is no merit to *tman.  Schuessler "mans" is what I would reconstruct.

Baxter-Sagart and Zhengzhang are totally speculative.  B & S *C.mans has a *C, which is a bad idea Baxter inherited from Bodman.  Bodman has C floating around in his OC “reconstruction”.  It is a place-holder.  But neither Bodman nor Baxter ever tell us what this “C” is.

  1. In philosophy of science, we know if a hypothesis is not falsifiable, then it is not much of a theory.  We had caloric substance which explains heat and ether which is supposed to be the medium through which electro-magnetic waves travel (Nature abhors a vacuum). The B & S system with its brackets and tightly connected prefix and loosely connected pre-consonants is an unfalsifiable system, and therefore not a serious theory. B & S try to cover all the bases and produced a totally unmanageable and unintelligible system.
  2. What is the origin of the ubiquitous C. in B & S ?  Jerry Norman in the 70s proposed Proto-Min, and the initial consonant system of the Proto-Min (PM) reconstructed comprises six manner groups, i.e., voiceless unaspirated (p-), voiceless aspirated (ph-), voiced plain (b-), voiced aspirated (bh-), voiceless softened (-p- ), and voiced softened (-b-).  It was on the basis of Norman’s PM that B & S developed their complicated system of pre-consonants.  But Jerry Norman in the last decade of his life totally changed his view.  South Coblin 2018 “Convergence as a Factor in the Formation of a controversial Common Min phonological configuration” reports at length Norman’s unpublished papers and his later view.  In Common Min, there are only three manner distinctions; voiceless unaspirated (p-), voiceless aspirated (ph-) and voiced plain (b-), just like Middle Chinese and Karlgren’s reconstruction of Old Chinese.  So neither B nor S did any field work on Min, did not follow the development in Proto-Min reconstruction, and picked up an outdated version of  Proto-Min reconstruction as the basis of their grand theory.
  3. J. Tharsen came to the Cornell Classical Chinese seminar in Sept. 2017 to talk about his thesis on 梁其钟。 All went well until it came to Tharsen’s Old Chinese transcription into the B & S system.  The graduate students and faculty members were naturally interested how these OC transcription sounded and Tharsen had no answer.  He was persuaded by me to adopt instead Baxter 1992 Handbook system.  Schuessler in his review of B & S came to the same conclusion.

B & S thought this C could be N, m, p, t, k, G etc. Name your pick.

VHM conclusion:

I do believe that Old Turkic tümen ("ten thousand", but often used for "an indefinitely large number"), Tocharian A tmān; B tmane, tumane ("ten thousand"), and Sinitic 萬 (MSM wàn; Old Sinitic Schuessler /*mans/, Baxter-Sagart /*[n]-s/, Zhengzhang /*mlans/) ("ten thousand") are somehow related, but it is not clear to me what that relationship is.  Above all, I do not find any evidence within Sinitic or hypothetical Sino-Tibetan of an initial "t" at any stage of their development.


[Thanks to Chris Button]


  1. Chris Button said,

    April 23, 2019 @ 10:52 am

    I too don't believe there is any evidence for a *t- prefix in OC here (or indeed elsewhere).

    It might also be noted that 萬 *máns (the OC *-s suffix would have already debuccalized by the time of any possible loan into Tocharian) is surely etymologically related to 曼 *máns. Since the character 萬 is being used as a jiajie (loan character) with the gloss of "myriad", its nasal *-n coda under the effect of suffixal *-s is not untoward in spite of the *-t coda elsewhere in the 萬 xiesheng series (e.g. 邁 *mráts); one might compare how "tents" and "tense" are often homophonous in English.

  2. Jonathan Smith said,

    April 23, 2019 @ 2:31 pm

    No internal Chinese evidence that I know of for *t-; Schuessler (2007: 507) cites Jingpho lə31-mun31 'ten thousand', which I know nothing about.

    Re: "Jerry Norman in the 70s proposed Proto-Min, and the initial consonant system of the Proto-Min (PM) reconstructed comprises six manner groups, i.e., voiceless unaspirated (p-), voiceless aspirated (ph-), voiced plain (b-), voiced aspirated (bh-), voiceless softened (-p- ), and voiced softened (-b-). [But in Norman's more recent] Common Min, there are only three manner distinctions; voiceless unaspirated (p-), voiceless aspirated (ph-) and voiced plain (b-), just like Middle Chinese and Karlgren’s reconstruction of Old Chinese":

    This is misleading because Norman's newer "Common Min" onset manners do not correspond to those of MC: Norman (and earlier Handel) was not disavowing "softened" onsets by adjusting Proto-Min or some such stage to *P-/*PH-/*B-; rather, earlier "softened" "*-P-" and "*-B-" across four tones were reconstrued as simply *B- across eight tones, etc.

    But it is true that "[i]t was on the basis of Norman’s PM that B & S developed their complicated system of pre-consonants." Schuessler's review designates that system "NOC" for "New Old Chinese"; in key respects and rather ironically it is also "Norman's Old Chinese", for the defining innovations — "pharyngealization" and "presyllables" — both have their origins in Norman's work (the presyllables in particular proposing to account via Norman's PMin for both Northern Min voiced onsets and the general Min mixture of [voiceless] aspirated and unaspirated analogues of the MC voiced series.) Here suffice to say I agree with Mei Tsu-Lin that this fails to reflect up-to-date understandings of the Min situation… Zev Handel and AKITANI Hiroyuki are key contributors in this area though naturally much remains debatable.

  3. Victor Mair said,

    April 23, 2019 @ 2:41 pm

    From Sasha Lubotsky:

    Starostin et al. in their Altaic dictionary reconstruct Proto-Altaic *či̯ùmi 'thousand', connecting Turkic *Tümen with the Korean (*čɨ́mɨ́n) and Japanese (*ti) words for 'thousand'. If this is correct, the Turkic origin of these words is then assured.

    [Starostin S.A., Dybo A.V., Mudrak O.A. An Etymological Dictionary of Altaic Languages, Leiden 2003, p. 403f.]

    The full text of this lemma is:

    ­čùmi thousand: Turk. *Tümen; Jpn. *ti; Kor. *čmn .

    PTurk. *Tümen ten thousand; very much (де­сять ты­сяч; очень мно­го): OTurk. tümen (Orkh., OUygh.); Karakh. tümen (MK, KB), (Kypch. 14 cent.) dümen; Tur. tümen ; Turkm. tümen (arch.); MTurk. tümen (Abush., Sangl.); Uzb. tuman ; Uygh. tümän; Krm. tümen, kimen , timen; Tat. tömɛn ; Kirgh. tümön; SUygh. tmen (ЯЖУ); Oyr. tümen; Tv. tümen; Yak. tümän.

    ◊ VEWT 504, EDT 507‑508, Лек­си­ка 574‑575. In general we agree with Doerfer’s arguments (TMN 2, 632‑642: the Turkic word is the source of Persian tūmān ‘10000’, not vice versa, although in some cases the word was borrowed back into Turkic (in particular: Az. tümän, Khal. timän ‘a Persian coin’, KBalk., Kum. tümen ‘10 roubles’); the Tokharian word, whose IE source is highly dubious, is most probably < Turkic; a Chinese source is extremely dubious). Turk. > Mong. tümen (see TMN 2, 641, Щер­бак 1997, 160), whence Evk. tumen etc., see Doerfer MT 78. Week evidence of initial voice (*d‑ should be expected in PT) may be due to later cultural interborrowing.

    PJpn. *ti thousand (ты­ся­ча): OJpn. ti; MJpn. ti; Tok. chi .

    ◊ JLTT 546.

    PKor. *čmn thousand (ты­ся­ча): MKor. čmn.

    ◊ Nam 437.

    ‖ SKE 38. Despite TMN 2, 641 the Turk.‑Kor. parallel seems quite acceptable. Jpn. *ti reflects a suffixed form *čum(i)‑gV.

  4. Victor Mair said,

    April 23, 2019 @ 3:35 pm

    From Brian Spooner:

    Iranian currency was always written in riyals, but the sums were always talked about in tumans (a tuman was 10 riyals, but the riyal had originally been a thousand of something else). Back in the old days, before the revolution, no one knew why, or when this started, or what the origin of tuman could have been. Not sure whether this has changed since the revolution, but when I was there in 98 and 99 it seemed to be the same from what I remember in earlier times.

  5. Victor Mair said,

    April 23, 2019 @ 6:11 pm

    Touman or Teoman (Mongolian: Tümen), or T'u-man, – is the earliest named Xiongnu* chanyu** (匈奴單于), reigning from c. 220–209 BCE.

    The name Touman is likely related to a word meaning '10,000, a myriad', which was widely borrowed between language families in, most plausibly, the order indicated by the following representative list of its forms: Modern Persian (which includes the Tajik and Dari dialects of it) tōmān ~ tūmān, Mongolian tümen, Old Turkic tümän, East Tocharian tmāṃ, West Tocharian t(u)māne, which possibly even includes Old Chinese and later 萬, whose pronunciation can be reconstructed as for instance an early Middle Chinese *muanʰ. Note however that our only certain evidence this number-word already existed around and before Touman's lifetime would be the Chinese; not until many centuries after he lived are the other languages with this word in them first attested.



    *On the ethnolinguistic identity of the Xiongnu — first syllable 匈 is γwn (xwn) in Old Sinitic — making it comparable to "Hun", see:

    The application of the name "Hun" to the Xiongnu is also attested in the Sogdian Ancient Letters, dating to around 312 (-314) A.D.

    **Chanyu (Chinese: 單于; Chinese: 单于; pinyin: Chányú; short form for Chengli Gutu Chanyu (Chinese: 撐犁孤塗單于; pinyin: Chēnglí Gūtu Chányú)) was the title used by the nomadic supreme rulers of Inner Asia for eight centuries and was superseded by the title "Khagan" in 402 CE. The title was used by the ruling Luandi clan of the Xiongnu during the Qin dynasty (221-206 BCE) and Han dynasty (206 BCE–220 CE).


  6. Chris Button said,

    April 23, 2019 @ 9:20 pm

    From B&S p.92:

    Our interpretation is that denasalizations *m >/b/, *n > /l/-, and *ŋ > /g/ affected nasals that were word-initial, with no presyllable, but that his development was blocked in onsets like *C.m⁽ˤ⁾-, where a preinitial consonant was present"

    Unless I'm missing something, given that 萬 apparently has *b- as the onset in Minnan, doesn't that contradict their reconstruction of *[n]-s ?

    More broadly, regarding my doubts concerning their reconstruction of a t- prefix in OC in general, I have commented on a couple of their proposals on LLog previously:

    齒 *kɬə̀ɣʔ "tooth" (B&S *t-[kʰ]ə(ŋ)ʔ or *t-ŋ̊əʔ) is etymologically related to 杵 *kɬàɣʔ "pestle" (B&S t.qʰaʔ) via the ə/a ablaut — compare "molar" and "mortar" in English from the same PIE root.

    肘 *trə̀wʔ (earlier *kʷrə̀ɣʔ) "elbow" (B&S t-[k]uʔ) has 九 *kə̀wʔ (earlier *kʷə̀ɣʔ) "nine" (B&S *[k]uʔ) as the original phonetic — a shift of *kr- to *tr- (the labialization regularly shifted to the coda) is not standard in OC but is phonologically entirely reasonable, and it is not the only case either – compare the association of 猪 *tràɣ "pig" with 豭 *kráɣ "boar" for example)

  7. Bina Tiferet said,

    April 24, 2019 @ 4:54 am

    No connection with Russian тьма?

    Overlapping meaning, with myriad/mist/fog?
    – tuman: "Тождественно этимологически тума́н «десять тысяч»."
    – t'ma: " которое объясняли частично из авест. dunman- «туман», частично – из тохар. tumane, tumāṃ «десять тысяч»"


  8. David Marjanović said,

    April 24, 2019 @ 9:27 am

    All went well until it came to Tharsen’s Old Chinese transcription into the B & S system. The graduate students and faculty members were naturally interested how these OC transcription sounded and Tharsen had no answer.

    B & S were simply honest enough to spell all their uncertainty out. That's what the capital letters, parentheses and brackets mean.

    Their predecessors, it seems, often went with one option they considered most likely (for reasons they didn't always spell out) and left the others out of their notations altogether. This is by no means limited to Sinitic within historical linguistics, of course. I've seen IEists do this explicitly.

  9. cameron said,

    April 24, 2019 @ 9:44 am

    Adding to the point made by Brian Spooner sbove: the modern Persian "toman" is ostensibly 10,000 dinars. But the dinar is not a unit that has been used in living memory. The word is still known, of course.

    I remember hearing reports a few years ago that they were discussing a currency reform that would have made the colloquial toman an official unit. But I don't know if anything ever came of that.

  10. Chris Button said,

    April 24, 2019 @ 10:11 pm

    The graduate students and faculty members were naturally interested how these OC transcription sounded and Tharsen had no answer.

    Maybe the pharyngealized uvulars were a struggle? It's ironic that an underlying phonological ə/a vowel system like that in a language like Ubykh can be dismissed as being typologically unlikely in OC and yet one has to turn to a language like Ubykh for pharyngealized uvulars.

    B & S were simply honest enough to spell all their uncertainty out. That's what the capital letters, parentheses and brackets mean.

    B&S are caveating their own system. I would suggest that it is not so much a question of honesty as one of making things fit the confines of a rigid system.

  11. Chris Button said,

    April 25, 2019 @ 9:51 pm

    Given that no explanation for the t- seems to have been found elsewhere, I thought I might offer up some idle speculation on my part…

    I wonder if the t- might be a vestige of Proto-Sino-Tibetan *tjək ~ *tjak "one". The *tjək form is reflected in Old Burmese *tɐc whose modern pronunciation of /tɪʔ/ is notably reduced to /tə/ in compounds like one hundred, one thousand, etc. This would give us a hypothetical Old Chinese *tə̀c.máns reduced to *tə.máns informally and ending up as *tə.mánʰ after the -s debuccalized. (Incidentally the PST ablaut variant *tjak is attested relatively late in Old Chinese as 隻 *tàc, although the character used is a loan from its original sense of 獲 "catch").

