Horse and wheel in the early history of Indo-European

In response to Don Ringe's recent post on "The Linguistic Diversity of Aboriginal Europe", David Marjanović asked

… is there a way to estimate how much time was available between the initial breakup of PIE and the establishment of sound changes that would make a Wanderwort traceable? I'd expect words like "horse" and "wheel" to potentially spread very quickly; indeed, there have been attempts to connect the East Asian Wanderwort for "horse" to the IE word (via Tocharian of course), similar attempts for Sino-Tibetan words for "cart/wheel", and others have found forms similar to the PIE */kʷekʷlo/- in both Northwest and Northeast Caucasian languages.

I forwarded this question to Don, who quickly answered:

Here are two documents toward a reply to the question you forwarded.  The first is a short exploration of the principles involved and a sketch of what the methodology has to look like.  It promises further postings that go into detail about IE words of interest.  The longer post is installment one of that, digging into 'wheel' and 'horse'.  I don't know whether it's suitable for the blog; it's long and technical, and unfortunately it can't be cogent *without* being long and technical.

If you're interested in the methods of historical-comparative reconstruction and their application to the relative and absolute chronology of the Indo-European languages, I believe that Don's answer will be well worth reading.  Much of the information in it is the fruit of recent research (as you can see from the references), and most of the rest is not available in one place, organized so as to address the sort of question that David asked. If these things don't interest you, you're welcome to pass on to some of our other fine posts — and of course, our famous double-your-money-back guarantee continues to apply.

I've done my best to turn Don's responses into html. I wouldn't be surprised to find that I screwed up some characters or some formatting, especially in the more technical explanation, so I've linked a .pdf form of his second document here as a back-up.


Inheritance vs. borrowing of reconstructed vocabulary.

David Marjanović has asked an interesting and highly pertinent question:  “is there a way to estimate how much time was available between the initial breakup of PIE and the establishment of sound changes that would make a Wanderwort traceable?”

The short answer is that there is no general rule; each case has to be considered separately.  The main reason is that regular sound change, like every other kind of linguistic change, does not proceed at a uniform pace.  In the very long run the fluctuations probably cancel each other out, but that won’t help us if we want to figure out what happened within, say, one specific thousand-year window.

Moreover, each regular sound change affects a single class of sounds — or, in some cases, a single sound — in the language in which it occurs, and each line of linguistic descent is characterized by a unique sequence of regular sound changes.  So it makes all the difference in the world which specific sounds occur in the word we’re interested in and whether they underwent any characteristic changes at a relevant time in the languages we’re interested in.

In trying to work out whether a given word could conceivably have been borrowed between related languages (instead of inherited from their last common ancestor), we need to take linguistic details, the probable cladistic tree, and real-world considerations into account.  We want to know:

1) what the form of the word reconstructable for the protolanguage is;

2) whether there are any puzzling irregularities in its apparent reflexes in the daughter languages that can’t be explained by other changes of known types;

3) what sound changes occurred in each of the relevant daughter languages, and the relative chronology of those changes, to the extent that it can be recovered;

4) whether the separation of the daughters was abrupt or gradual (i.e., with continuing contact as they diverged), if that can be recovered;

5) whether the relevant languages can be situated in space and time, e.g. by archaeological evidence.

The last point is important for the simplest reason of all:  languages borrow words only from languages with which they’re in some kind of contact; if it’s impossible or highly improbable that two languages were in contact at an appropriate time, then shared words which appear to be cognates must really be cognates.

There is one other consideration which should be discussed.  Nonspecialists sometimes think of languages, including reconstructed languages, as sets of words; but that’s somewhat less than half true.  Yes, every language does have a distinctive lexicon, but the structure of the language is even more distinctive; you can replace a large propor­tion of the lexicon with words borrowed from other languages without any significant effect on the language’s structure.  (Modern English is an obvious example.)  Historical linguists recon­struct a protolanguage’s system of sounds and system of inflectional morphology as well as its lexicon.  In some cases the sound system and inflectional system turn out to be complex and intricate, and PIE happens to be one of those cases.  Moreover, because we reconstruct protolanguages by exploiting the regularity of sound change, competent re­con­structions are mathematically precise.  Under those circumstances, when we reconstruct a word which fits perfectly into the sound system and inflectional system, with no hint that there is anything out of line, the default hypothesis has to be that it’s an inherited word, simply because the odds that a word borrowed from some other lan­guage would fit in well are significantly lower.  Of course we’d still like to know whether it could conceivably be a loanword, just to have all the bases covered; that’s why Mr. Marjanović’s question is apt.  But unless there’s positive evidence that it is a loanword, linguists will regard that possibility as something of a long shot.

Because I’ve done a lot of work with the relative chronology of sound changes in various Indo-European languages, I can work through a couple of relevant examples in light of Mr. Marjanović’s question — the full story is below.  But be warned that the explanation is technical and detailed, because that’s the only way to get cogent.

Inheritance vs. lexical borrowing:  some Indo-European cases.

The difference between regular sound changes and other types of changes (most of which are motivated by morphology) is so important that historical linguists symbolize them differently in summaries of changes.  In what follows, “>” indicates one or more regular sound changes, while “→” indicates changes of other kinds, sometimes lumped together as “analogical” changes.  The most common of the latter is levelling. If one sound occurs in some forms of an inflectional paradigm and another occurs in the same position in other forms of the same pattern, one or the other can be generalized, or “levelled”, through the entire paradigm.  (This is not regular sound change because it’s conditioned by morphology.)  In the same way, if a word is accented on the root in some parts of its paradigm but on the suffix in others, one accent or the other can be generalized; if a noun belongs to one gender in the singular but to another in the plural, one or the other can be generalized to both numbers; and so on.  Most of the analogical changes posited below are levellings or other minor adjustments of well-known types.  The major exception is the Greek word for ‘horse’, on which see further below.

1.  The non-Anatolian word for ‘wheel’.

Reconstructable form:  PIE *kwékwlo-s (masc.), collective *kwekwlé-h2 (→ neut. pl.).

Analysis:  derived from *kwel- ‘turn’; pattern of derivation (reduplication + zero-grade root + thematic vowel) is unique (archaic?), so this word is overwhelmingly unlikely to have been formed more than once.

Development of attested forms in the daughter languages:

*kwékwlos > *kwékwlë  (Ringe 1996:74-5, 88, 90-1) > *kwyékwlë (ibid. pp. 102-4) > *kwyə́kwlë (ibid. pp. 124-8, 139) → Proto-Tocharian *kwə́kwlë ‘chariot, wagon’ (with adjustment of palatalization in a reduplicated form, ibid. pp. 143-4; or is this just straightforward assimilation?);

> *kŭkl ~ *kŭkla- > *kukäl ~ kukla- → Tocharian A kukäl ~ kukla- (Ringe 1998; see Kim 1999 for an important revision);

> *kwəkwə́lë (Ringe 1987) > Tocharian B kokale.

*kwékwlos ~ *kwekwléh2 > *kéklos ~ *keklā́ > Proto-Indo-Iranian *čáklas ~ *čaklā́;

>→ Vedic masc. cakrás (occasionally attested in the Rigveda), neut. pl. cakrā́(ṇi) → neut. cakrám, pl. cakrā́(ṇi);

> Proto-Iranian *čaxrah > Avestan čaxrō (no pl. attested).

*kwékwlos ~ *kwekwléh2 >→ Homeric Greek masc. κύκλος /kúklos/, neut. pl. κύκλα /kúkla/;

*kwékwlos ~ *kwekwléh2 > *hwéhwloz ~ *hwegwlā́ > Proto-Germanic masc. *hwehwlaz, neut. pl. *hweulō (Ringe 2008:72-3, 94-6, 102-3, 108, 146-8);

>→ Old Norse hvél and hjól (both neut.);

> Proto-West Germanic *hwehl (*hwehul?), *hweul- >→ Old English hwēol, hweowol, hweogol (all neut.), with substitution of the productive alternation *h ~ *g for anomalous *h ~ *w (Ringe 2008:108) and various levellings of alternations.

Discussion. Any of the sound changes peculiar to the first-order daughters would have made undetected borrowing impossible.  These include the Tocharian merger of short *i, *e, and *u as *ə; the Indo-Iranian palatalization of the initial velar and the subsequent merger of nonhigh short vowels as a; the Greek rounding of the first vowel to (*o and then) u, and the consequent unrounding of the labiovelar; and Grimm’s and Verner’s Laws in Germanic, which radically reshaped the system of obstruents.

But the Tocharian vowel merger must have occurred far down in the independent prehistory of that subgroup, since it was preceded by more than a dozen other regular sound changes (see the chart at Ringe 1996:139).  The Indo-Iranian chronology of sound changes is not much more promising:  the palatalization of velars has to have been preceded not only by the merger of velars and labiovelars, but also by the resolution of *R̥H-sequences (which sometimes yielded palatalizing front vowels) and the affrication of inherited palatals (since they did not merge with palatalized velars).  Greek and Germanic seem at first more promising, since Grimm’s Law was a comparatively early Germanic sound change (see the chart at Ringe 2008:152) and the Greek vowel rounding could have occurred very early (note that the unrounding of labiovelars next to u-vowels has already occurred in the Linear B documents).  But a glance at any probable cladistic tree of the Indo-European family (e.g. the first tree on p. 397 of Nakhleh et al. 2005, or any of the alternative trees in that article) will show that the divergence of Germanic, Greek, and Indo-Iranian from one another (and from Armenian and Balto-Slavic) probably occurred fairly late in the initial diversification of the family, so being able to say that borrowing could not have occurred after “early” changes in any of those languages is less useful than it might be.  (It’s true that the divergence of Greek and/or Germanic from the rest of the family might be as early as 3000 BCE, if the estimated dates of internal nodes in these trees are in the right ballpark; but that would still be a good 500 years after the probable divergence of Tocharian from the rest of the non-Anatolian branches, and a whole millennium after the likely date of PIE.)  So it looks like the recoverable relative chronology of sound changes is not going to be very helpful in this case.

On the other hand, the pattern of shared and unique linguistic changes and the findings of archaeology turn out to be very helpful.  One of the striking things about Proto-Tocharian is that none of the linguistic changes that characterize it can be shown to be historically shared with any other subgroup of Indo-European; either they’re “natural”, easily repeat­able changes which could have occurred independently any number of times (like the merger of palatals and velars, or the raising of word-final long *ō to *ū; see Ringe 1990:59-105 and 1996 passim) or they’re unique within the family (like the loss of *bh immediately following *m, or the complex pattern of Tocharian vowel mergers).  It appears that the separation of Tocharian from the rest of the family was sharp, and that it did not again come into contact with other IE languages (specifically, Iranian languages) for many centuries.  (The attempt to connect Tocharian B tek- ‘touch’ with Gothic tekan in Ringe 1990:105-15 is tantalizing but inconclusive; there is too much likelihood that the words resemble one another by sheer chance.  The fact that the similar Romance words —Italian toccare, French toucher, etc.—clearly do resemble the Germanic and Tocharian words by chance (see Meyer-Lübke 1911:664) adds weight to that point.)

Moreover, there seems to be only one archaeological culture that could reflect the pre-Tocharians, namely the Afanasievo culture.  This culture, associated with horses (see below!), appeared abruptly in the Altai around 3500 BCE and appears to represent a migration from the lower Volga area some 2000 miles to the west.  It’s hard to resist the conclusion that the Afanasievo migration represents the separation of pre-Tocharian from the rest of the family (Anthony 2007:264-5); and if that’s true, then the odd reduplicated word for ‘wheel’ must already have been in existence, and have been inherited by or borrowed into (or out of) pre-Tocharian, before 3500 BCE.  That’s later than any date that most of us would assign to PIE, but not much later, and for the purposes of reconstructing palaeo­cultures it’s not significantly different.  The fact that the Tocharian word refers to a wheeled vehicle rather than a wheel is not problematic; words shift their meanings all the time, and this particular shift is not surprising.

So the non-Anatolian word for ‘wheel’ was either inherited from the last common ancestor of the non-Anatolian branches, or else it was borrowed into or from pre-Tocharian before 3500 BCE.  In terms of time depth there’s not much difference between those alternatives.

2.  The Proto-Indo-European word for ‘horse’.

Reconstructable form:  PIE *éḱwos (masc.).

Analysis:  apparently unanalyzable.

Development of attested forms in the daughter languages:

*éḱwos > Proto-Anatolian *áḱḱwos (Melchert 1994:62-3, 74-5) > Proto-Luvian *áttswos (Melchert 1987, 1994:251-2);

> Cuneiform Luvian azzuwas, Hieroglyphic Luvian á-zú-wa-;

> *asbe > Lycian esbe (Melchert 1994:302, 310-1).

*éḱwos > *ékwë (Ringe 1996, as above) > Proto-Tocharian *yə́kwë (pace Kim

1999:158, 163, 167);

> *yəkw > *yŭk > Tocharian A yuk;

> Tocharian B yakwe.

*éḱwos > Proto-Indo-Iranian *áćwas;

> Vedic áśvas;

> Proto-Iranian *atswah > Avestan aspō, Old Persian asa.

*éḱwos >→ Greek *íkwkwos (cf. Mycenaean i-qo; but why *i-?? contamination with some other word?) > *íppos (cf. compound names like Ἄλκιππος /Álk-ippos/, with no aspiration) → ἵππος /híppos/ (again, where does the /h-/ come from?); problematic cognate.

*éḱwos > Proto-Italic *ékwos > Latin equos.

*éḱwos > Proto-Celtic *ekwos > Gaulish Epo- (in names), Old Irish ech.

*éḱwos > Proto-Germanic *ehwaz > Old Norse jór, Old English eoh; cf. also Old Saxon ehuskalk ‘mounted retainer’, Gothic ƕatundi ‘thornbush’ (*‘horse-tooth’).

*éḱwos > Proto-Balto-Slavic *éšwas; derived fem. *ešwā́ > Lithuanian ašvà ‘mare’.

*éḱwos > *eš > Armenian êš ‘donkey’.

Discussion. The consonant cluster *ḱw is rare, and we might have hoped that it would develop in some unusual way in many first-order daughters.  But once again the daughters in which it underwent changes that should make loanwords detectable diverged from the rest of the family fairly late (to judge from the trees in Nakhleh et al. 2005).  In Tocharian and Italo-Celtic it merely underwent the merger of palatals and velars (followed, in Celtic, by a merger with the voiceless labiovelar).  The initial vowel, too, survived without change for a long time in most daughters.

But the Anatolian reflex is very distinctive, because of a bizarre sound change that replaced word-initial accented *é plus a single consonant (followed by a vowel or semi­vowel) with accented *á plus a geminate consonant (“limited Čop’s Law”; see Melchert 1994, as above).  That sound change can be shown to have occurred after another Anatolian sound change (loss of word-initial *h1) but before a third (loss of word-initial *y when followed by an e-vowel), so it was neither among the first nor among the last pre-PA changes (Melchert 1994:90).  We should be able to argue that after the limited Čop’s Law change an undetectable borrowing of ‘horse’ into or out of Anatolian would have been impossible.

Unfortunately there is a further complication that undermines any such argument.  If we could adduce a Hittite cognate “akkuwas”, the argument would be ironclad.  But though horses are often referred to in Hittite documents, the scribes never spell the word out (!); instead they use a logogram (word-sign), one of many adopted as part of the cunei­form writing system, which is usually transliterated with the Sumerian phrase ANŠE.KUR.RA ‘donkey of the mountains’.  All the actually attested Anatolian words for ‘horse’ are from languages of the Luvian subgroup; and in that subgroup the initial vow­els of all the relevant words are etymologically ambiguous! Cuneiform and Hieroglyphic Luvian a- could in principle reflect Proto-Luvian and Proto-Anatolian *a-, *e-, or *o- (Melchert 1994:262-4).  In Lycian the situation is even stranger.  First Proto-Anatolian *e and *o merged as e, while *a remained distinct as a. But then an umlaut rule operated, changing the frontness of vowels to agree with the frontness of the vowel in the next syllable, and the rule iterated from right to left through the word (Melchert 1994:310-1, 328).  As a result, Lycian esbe could reflect earlier *asbe (with PAnat. *a-) or earlier *esbe (with PAnat. *e- or *o-).  So it turns out that the Proto-Luvian form could actually have been either *áttswos or *éttswos; and since we have no other direct evidence for the Proto-Anatolian form, that could have been either *áḱḱwos or *éḱḱwos—the former if it was inherited from PIE according to the hypothesis sketched above, the latter if it was borrowed from some other IE language after the limited Čop’s Law change had run its course!  But what about the geminate stop, which is clearly preserved in Cuneiform Luvian?  It turns out that there was yet another Anatolian sound change which geminated voiceless stops, and we don’t know when it occurred—it need not have been early (Melchert 1994:62); so borrowing of *éḱwos from another IE language, followed by gemination of the stop in the Anatolian languages, is not impossible.  Once again the linguistic evidence has left us in the lurch.

And once again cladistics and archaeology come to our rescue.  The presence of this word for ‘horse’ in Tocharian guarantees its existence in the non-Anatolian half of the family by 3500 BCE for the reasons advanced above in the discussion of ‘wheel’.  The abrupt separation of Tocharian and the fact that that event can (probably) be traced archaeologically are crucial.  Unfortunately the archaeological situation for Anatolian is very different.  Anthony’s suggestion that an expansion of the steppe culture into the Danube delta around 4200 BCE reflects the incipient separation of Anatolian from the rest of PIE (Anthony 2007:249-57) is reasonable, but any connection with Anatolia seems to rest on speculation (ibid. p. 262).  Working backwards from the historical record, we know that speakers of Anatolian languages were in central Anatolia by the 19th century BCE; when and by what route they arrived there remain very unclear, though a cultural disruption in the 27th century BCE is apparently a likely candidate for an Anatolian incursion into the area (Mallory 1989:24-9).  That still seems to leave many centuries for potential contact between Anatolian and other IE languages.

But the distribution of linguistic innovations tells a different story.  Like Tocharian, Anatolian shares no distinctive innovations with any other subfamily of IE (cf. Melchert 1994:60-91, Ringe 2000); so far as we can tell, its separation from the rest of the family was reasonably “clean”.  Moreover, the cladistic tree tells us that that separation must have been earlier than that of Tocharian.  This reduces the viable options to two:  either the well-known word for ‘horse’ was inherited by Proto-Anatolian, according to the scenario sketched above, or else it was borrowed into or out of pre-PA during the relatively short time when pre-PA was still in contact with related languages—and that time must have been some centuries before 3500 BCE, and few detecta­ble innovations can yet have occurred in pre-PA.

This raises a methodological point that we can no longer avoid.  Is there any difference between a word which is reconstructable for a protolanguage and a word which spread from dialect to dialect of the protolanguage as it was breaking up?  As usual, it depends on the individual case.  If the real-world separation of the daughters was genuinely abrupt—that is, one group picked up and moved within a generation or so, and subsequent contacts were infrequent and brief—then there is a clear difference between the two scenarios.  But most disintegrations of speech communities don’t happen like that; dialects remain in contact as they diverge, continuing to trade linguistic material until some event finally makes them lose touch altogether.  (The best discussion of these processes is Ross 1997.)  In such cases the “protolanguage” which we reconstruct is most unlikely to correspond to a single, completely uniform dialect that existed in the real world before its speaking population became large enough to exhibit significant linguistic diversity; it almost inevitably corresponds to a dialectally diversified speech community, still unified but no longer uniform, simply because we can’t tell the difference between words and grammatical forms which had been in the language for generations and those which had arrived very recently.  It is also likely that our reconstruction will be temporally “out of focus”, including some inherited words and forms which were no longer characteristic of all the dialects and some new words and forms which were still spread­ing from dialect to dialect.  There are good reasons to suspect that our reconstruction of PIE is like that.

But once again this doesn’t make much difference for our reconstructions of palaeocultures.  Whether the reconstructable PIE word for ‘horse’ was already in the common ancestor of all the IE languages in, say, 4200 BCE or spread through a rapidly diversifying IE dialect continuum around 3700 BCE can’t be expected to have any impact on subsequent prehistoric and historical developments.  In this case, at least, a degree of detail too fine for linguists to recover is also too fine to have any consequences for history.

A final note about ‘horse’:  the shape of the Greek word can’t be explained by regular sound changes and plausible analogical changes.  The /h-/ of the Classical form is a problem internal to the history of Greek, since it isn’t there in the fossilized compounds used as personal names (thus Ἄλκιππος /Álkippos/ ‘His-horses-are-his-defense’, not Ἄλχιππος /Álkhippos/”).  But the /i/ is there from the beginning of our attestation, and it’s a total mystery.  It’s worth thinking about the possibility that the Greek word might be a loanword—if only we knew of a language in which *é- gave í- by regular sound change, or a non-Indo-European language that could have borrowed the word and altered it in that way.

In both the above examples we didn’t arrive at any firm conclusions by trying to exploit regular sound changes.  That raises an obvious question:  are there any non-obvious cases in which that approach does give good results?  And if it doesn’t look likely that ‘wheel’ or ‘horse’ is a Wanderwort, what would a Wanderwort look like?  I hope to address those questions in a further posting.


Anthony, David.  2007.  The horse, the wheel, and language. Princeton:  Princeton U. Press.

Kim, Ronald.  1999.  “The development of labiovelars in Tocharian:  a closer look.”  Tocharian and Indo-European Studies 8.139-87.

Melchert, H. Craig.  1987.  “PIE velars in Luvian.”  Watkins (ed.) 1987:182-204.

—.  1994.  Anatolian historical phonology. Amsterdam:  Rodopi.

Meyer-Lübke, Wilhelm.  1911.  Romanisches etymologisches Wörterbuch Heidelberg:  Winter.

Nakhleh, Luay, Don Ringe, and Tandy Warnow.  2005.  “Perfect phylogenetic networks:  a new methodology for reconstructing the evolutionary history of natural languages.”  Language 81.382-420.

Ringe, Don.  1987.  “On the prehistory of Tocharian B accent.”  Watkins (ed.) 1987: 254-69.

—.  1990.  “Evidence for the position of Tocharian in the Indo-European family?”  Die Sprache 34.59-123.

—.  1996.  On the chronology of sound changes in Tocharian.  Vol. 1. New Haven:  American Oriental Society.

—.  1998.  “Schwa-rounding and the chronology of sound changes in Tocharian A.”

Jasanoff, Jay, H. Craig Melchert, and Lisi Oliver (edd.), Mír curad:  studies in honor of Calvert Watkins (Innsbruck:  IBS) 611-8.

—.  2000.  “Tocharian class II presents and subjunctives and the reconstruction of the Indo-European verb.”  Tocharian and Indo-European Studies 9.121-42.

—.  2008.  From Proto-Indo-European to Proto-Germanic.  A linguistic history of English, Vol. 1. Revised ed.  Oxford:  OUP.

Ross, Malcolm.  1997.  “Social networks and kinds of speech-community event.”  Blench, Roger, and Matthew Spriggs (edd.), Archaeology and language I: theoretical and methodological orientations (London:  Routledge) 209-61.

Watkins, Calvert (ed.).  1987.  Studies in memory of Warren Cowgill. Berlin:  de Gruyter.



  1. David Marjanović said,

    January 10, 2009 @ 8:20 am

    Prof. Ringe, thank you very much for this long, technical, detailed explanation! Unfortunately I don't have time to finish reading it right now, but expect me back tomorrow (if, that is, I'll have anything left to say).

  2. Jonathan Badger said,

    January 10, 2009 @ 9:08 am

    As a biologist specializing in phylogeny, I find it interesting that David even adopts the language of biology at points in his discussion — "cladistic tree", etc. Not that the connection between historical linguistics and phylogeny hasn't been noted — phylogeneticists such as Tandy Warnow have dabbled in historical linguistics, but still, I found the actual language of biology being adopted interesting — maybe in the future linguistics will speak of orthologs and paralogs rather than of cognates…

    [(myl) Actually, for the first century or so, the balance of conceptual and terminological trade was in the opposite direction. Two specific examples: Darwin explicitly borrowed the idea of a tree-structured "descent with modification" from historical linguistics; and the first use of algorithmic methods to infer phylogenetic trees was in linguistics (under the name of lexicostatistics), not in biology. ]

  3. Aaron Davies said,

    January 10, 2009 @ 9:35 am

    i get the impression from what i’ve read of current linguistic methods that references to cladistics imply methods similar to those that differentiate biological cladistic phylogeny from classical taxonomy—specifically, statistics.

    as to the borrowing of phylogenetic language in general, linguistic evolution is, well, evolution. (even if it is more lamarckian than darwinian.) similarly, if what we might call “historical memetics” ever becomes a mathematically-grounded field, i would expect to see the same language used there. (this diagram of schisms in jainism already looks an awful lot like, say, this cladogram of reptiles. it also illustrates the most interesting difference—the inherent ability of memes (and languages) to relate via DAGs, not just trees, something that's essentially impossible in biological evolution, at least of the “higher animals”.)

  4. Jonathan Badger said,

    January 10, 2009 @ 9:47 am

    Actually, I see the the the discussion is actually Don Ringe's, and one reason he may be adopting biological terminology is that he is a co-author of Warnow's on the very paper I was referring to!

  5. Robert said,

    January 10, 2009 @ 10:09 am

    That was a very clear, and much-welcomed, explanation of the issues, which raises a few questions

    What was the Anatolian word for wheel? Given it's lack of mention above, I'd assume it isn't cognate with the Indo-European term. Is it thought to be a borrowing from some other language, or is its origin unknown? if it's a borrowing, that would presumably give a handle of when those languages moved into that area.

    Are there any other unexplained e to i transitions in Greek? If a dozen other words were affected, with no apparent pattern, I'd guess that would change the relative likelihood of the possible reasons.

    Were horses domesticated just once, or many times. While a word for horse can predate domestication, it would seem plausible that it was repeatedly borrowed along with other horse related terminolgy as domestication spread, even into different language families. Conversely, if horses were domesticated independently by two cultures, they're unlikely to have borrowed the word from each other, even if there's a strong resemblance.

  6. language hat said,

    January 10, 2009 @ 10:39 am

    A superb discussion, and I look forward to many more such! (Can Don become an official Logger so his posts don't have to be "Filed by Mark Liberman"?)

  7. Jesus Sanchis said,

    January 10, 2009 @ 1:39 pm

    I've read the post and I've found it interesting, especially because it offers relevant information about the issue (the words for "wheel" and "horse" in PIE) without forcing the data into impossible solutions or interpretations. Ringe's approach is quite 'traditional' at times, for example when he talks about genealogical trees or the application of 'laws', but I liked the way he described PIE:

    "In such cases the “protolanguage” which we reconstruct is most unlikely to correspond to a single, completely uniform dialect that existed in the real world before its speaking population became large enough to exhibit significant linguistic diversity; it almost inevitably corresponds to a dialectally diversified speech community, still unified but no longer uniform, (…) It is also likely that our reconstruction will be temporally “out of focus”, including some inherited words and forms which were no longer characteristic of all the dialects and some new words and forms which were still spread­ing from dialect to dialect."

    The final remarks in the article are open-ended: Were the terms for 'wheel' and 'horse' common in PIE or were they later diffused from one dialect? Let's remember that these words are often used as important 'evidence' to prove the traditional chronology of PIE, or the supposed expansion of IE in connection with horses, as we saw in Ringe's previous article in the Language Log ("The Linguistic Diversity of Aboriginal Europe"). Is this not a bit contradictory?

    Finally, I would like to suggest a couple of texts that deal with the same issues from an alternative perspective.

    1) About horses and IE.

    For those who can read Spanish, here's an interesting article by Xaverio Ballester: "Centauros de la estepa". It is available on the Internet:

    This text was later published, as an expanded and revised version, in one of Ballester's books, which I strongly recommend: "Zoónimos Ancestrales", Valencia, Biblioteca Valenciana, 2006. I also posted a review of this book in my blog:

    2) About wheels and IE:

    An article by Mario Alinei: "The Celtic origin of Lat. "rota" and its implications for the prehistory of Europe" (2004), Studi Celtici. Available here:

  8. Mike Keesey said,

    January 10, 2009 @ 2:02 pm

    "it also illustrates the most interesting difference—the inherent ability of memes (and languages) to relate via DAGs, not just trees, something that's essentially impossible in biological evolution, at least of the “higher animals”.)"

    Most life forms are not "higher animals" (by which I assume you mean metazoans?). Evolution in bacteria, where lateral transfer is rampant, is much better represented using networks, for example. Even in, say, vertebrates, there are still instances of hybridization where simple trees do not suffice. And of course, if you go to the level of individual organisms, simple trees don't work. (Genealogies are called "family trees", but are really "family networks", since people have two parents, not one.) Finally, symbiogenesis can also be considered a case where simple trees don't work.

  9. James D said,

    January 10, 2009 @ 3:48 pm

    Disclaimer #1: I am not a linguist.
    Disclaimer #2: I don't want a silly row about Australopithecenes having spoken PIE.

    What about the other, arguably more normal word for horse (Welsh: ceffyl, Irish: capall, Latin: caballus, French: cheval, Romanian: cal, etc), so detested by the Latin purists? And I'm slightly sceptical about the Gaulish "Epo-": to the best of my (albeit very limited) knowledge, it only occurs initially in three-element personal names (viz Eposognatos and Eporedorix), which leaves it open to easy reinterpretation as patronymic (cf Welsh "ap", "epilion") + two-element names. The other instances of equus-type words in Celtic seem to be in a high or specialist register. Could a case therefore be made for the ceffyl-type word being the genuine Celtic one, with the equus-type word being a loan-word (presumably from an Italic language)?

    [(myl) Don responds in an update to this later post.]

  10. Patrick Wynne said,

    January 10, 2009 @ 3:51 pm

    Yes, I second LH's request! As great as all the content is on Language Log, these kinds of posts are what really fascinate and excite me. I would love for Prof. Ringe to become a regular contributor.

  11. dr pepper said,

    January 10, 2009 @ 6:19 pm


    Also, can we designate Sanchis as Advocatus Diabli?

  12. Etienne said,

    January 10, 2009 @ 7:46 pm

    I'd like to third LH's request: these posts on comparative and historical linguistics are gems of precision and clarity, in terms of methodology as well as of data. Speaking of which, a nitpick: the form "equos" is Old Latin, not ordinary (i.e. Classical) Latin (whose form was "equus").

    To James D.: actually, the Welsh and Irish forms are assumed to be Latin loanwords, though the etymology of "caballus" itself being obscure in Latin, a Celtic origin has been postulated by some scholars. However, to assume the Celtic equus-type nouns to be borrowings from Italic is uneconomical: Old Irish ECH, plural EICH (reflexes of which are alive and kicking in some modern Gaelic varieties) are exactly what regular, *inherited* reflexes in Celtic of (late Western) Indo-European *EKWOS, plural *EKWOI are expected to look like. Thus even if your doubts as to the correctness of the etymology of Gaulish "epo-" turn out to be well-founded, it is plain that Proto-Celtic did indeed inherit the Indo-European word for "horse".

    Also: while EQUUS did not survive in any Romance language, its feminine form EQUA did, and is alive and kicking in several Romance languages today (Spanish YEGUA, Romanian IAPA: the forms, incidentally, are phonologically regular in terms of *inherited* [as opposed to borrowed] vocabulary in the relevant Romance languages). This situation in Romance (whereby a reflex of the Indo-European word for "horse" only survives in the word "mare") does offer an interesting typological parallel to the situation in Lithuanian presented by professor Ringe above, and shows some of the perils and limits of comparative reconstruction: in an alternate universe where Latin is undocumented and where Romance is directly compared to other Indo-European languages, the similarity with Lithuanian might drive scholars to conclude that the original Indo-European word indeed meant "mare" instead of "horse".

    [(myl) Don comments further on some of these points in an update to this later post.]

  13. Aaron Davies said,

    January 11, 2009 @ 8:40 am

    @Mike Keesey: yes, more or less. (i wasn't aware of the term "metazoan", and put "higher animals" in scare quotes since i'm well aware the phrase belongs firmly to the nineteenth century. and of course all manner of weird things go on in, e.g., felidae.

  14. Aaron Davies said,

    January 11, 2009 @ 8:42 am

    @Etienne: doesn't french have "équine"? istr reading the sequence "cheval, équine, hippodrome" quoted somewhere as an example of registers and source languages.

  15. Philip Spaelti said,

    January 11, 2009 @ 10:10 am

    @Aaron: there are also équestre, équitation

  16. Philip Spaelti said,

    January 11, 2009 @ 11:04 am

    Jesus Sanchis wrote:

    2) About wheels and IE:

    An article by Mario Alinei: "The Celtic origin of Lat. "rota" and its implications for the prehistory of Europe" (2004), Studi Celtici. Available here:

    Thank you Jesus Sanchis for this link. It is quite interesting to compare the styles of these two articles. After reading Ringe who carefully worries about each vowel and consonant, and agonizes about the i/e problem in Greek, to then read Alinei, who throws together a list of (probable) cognates from all different ages, and then summarily declares "it's a loan from Celtic". Hey, I'm not saying he's wrong, but his case might be strenghtened if he actually gave the actual form of the proposed Urwort, and then showed how it came about in each attested borrowing language, including accounting for all the changes in gender, etc.

    Of course accepting this loanword theory, and jumping from there to the PCT is whole 'nother kettle of fish.

  18. David Marjanović said,

    January 11, 2009 @ 12:37 pm

    So, I sum up: it is possible that "horse" and "wheel" are Wanderwörter, but it's not the most parsimonious option, and it wouldn't matter much anyway. OK, thanks. :-)

    the Greek vowel rounding could have occurred very early (note that the unrounding of labiovelars next to u-vowels has already occurred in the Linear B documents).

    That said, the glyph for /kʷe/ in all three Linears, the "Cretan hieroglyphs", and the Phaistos Disk script (or rather font) is a circle with varying numbers of smaller circles in it that are themselves arranged in a circle. It could represent a round shield or a wheel, and unsurprisingly the latter possibility has been suggested in print. If it is correct, the change may have occurred just a few hundred years before Linear B developed.

    I find it interesting that David even adopts the language of biology at points in his discussion

    As mentioned above, the discussion isn't mine, but I'd still have used the same terminology, because I am a biologist… I'm currently working on my PhD thesis on the origin of turtles and the origin of lissamphibians ( = frogs, salamanders, caecilians, and maybe the extinct albanerpetontids); on this latter topic, three papers are out, and two are to follow soon.

    the first use of algorithmic methods to infer phylogenetic trees was in linguistics (under the name of lexicostatistics), not in biology.

    Be very careful about the difference between phenetics and phylogenetics. Not everything that uses a data matrix and a computer to plot trees is the same.

    Phenetics counts similarities. The resulting matrix can be used to calculate a tree, and if there is little enough homoplasy ( = convergence, reversals, and lateral gene transfer/borrowing) in the dataset, this tree will be congruent with the phylogenetic tree, but phenetics cannot test the very assumption that there is little enough homoplasy in the dataset. Lexicostatistics is a phenetic method, as far as I know.

    Phylogenetics is about actually reconstructing the phylogenetic tree. The formal version* is a group of methods together called cladistics; they all count shared derived similarities (biology: "synapomorphies"; linguistics: "shared innovations"). Cladistics was inventioned by the entomologist Willi Hennig in 1950, but that was in East Berlin, and the book is said to have a barely readable style, so things only started to take off when an English translation came out in the USA in 1966, and then it had to fight against tradition and phenetics for decades… I can explain the reasons for this strange history if asked. Cladistic methods are simple parsimony (what Hennig came up with), maximum likelihood (parsimony + the assumption that certain characters will evolve with certain rates; requires lots of knowledge about these rates, but for molecular data this is often available in some form), and the Bayesian approach to likelihood.

    * As opposed to, in biology, the traditional non-method, which consisted more or less of doing cladistics with very few characters (often one) in one's head, building an evolutionary scenario on the result, and using this scenario to construct a tree that conformed to it; and in linguistics, the comparative method, which is much more similar to cladistics but does not habitually make the parsimony criterion so explicit as to actually count the steps a phylogenetic hypothesis ( = tree) requires.

    this diagram of schisms in jainism already looks an awful lot like, say, this cladogram of [vertebrates].

    This is deceptive. Both are dichotomously branching tree-shaped diagrams that consist of lines with meaningless line width, but that's where the similarities end. The phylogenetic tree of Jainism was not arrived at by cladistics, but reconstructed from historical documents. The phylogenetic tree of the vertebrates — wait. WTF. What does "Pisces" do in there?!? This term hasn't been used in decades because it referred to a group that is way too paraphyletic to be useful! — is not a cladogram either, because it is not the outcome of a cladistic analysis, although it is most likely a summary of cladograms found in the literature. In the tree itself, "Reptilia" should be replaced by "Sauropsida", BTW.

    Are there any other unexplained e to i transitions in Greek?

    I don't know, but I can easily pull a suggestion of how it might have worked out of my, uh, fingertips: in the AFAIK rare cluster *ḱw, which is not terribly easy to pronounce, the *ḱ might have turned by 180°, so to say: */ˈekʲwos/ > */ˈejkwos/ > */ˈikʷos/, which (with or without the gemination to */ˈikʷːos/) ought to be the attested Linear B form and leads straightforwardly to the Classical personal names in -/ipːos/. Now, are there any other cases where PIE *ey became Greek i?

    Were horses domesticated just once, or many times.

    Apparently just once, with an interesting connection to the earlier domestication of reindeer (but I digress)…

    And I'm slightly sceptical about the Gaulish "Epo-": to the best of my (albeit very limited) knowledge, it only occurs initially in three-element personal names

    And in Epona, the goddess of horses, whose name is occasionally written in Latin letters on her altars.



    doesn't french have "équine"?

    Sure, but that's a learned loan from Classical (or Renaissance) Latin, if not outright from Scientific (Equinae = grass-eating horses, with the suffix -inae for "subfamilies"). Had équestre and équitation been inherited rather than borrowed the same way, they'd be *équêtre and *équitaison or something — regular sound changes to the rescue once again.

  19. David Marjanović said,

    January 11, 2009 @ 12:40 pm

    Oops, I forgot to close the <a> tag, respectively to scroll down to the preview…

  20. Etienne said,

    January 11, 2009 @ 2:48 pm

    David, Aaron-

    Actually, the most telling sign that French words such as "équine, équestre" (the latter is first attested in 1355, the former before 1502, incidentally) are borrowed rather than inherited from Latin is the intervocalic /k/: in Spanish and Portuguese Latin intervocalic stops were voiced (cf. YEGUA in Spanish, from EQUA), whereas in French these stops were first voiced, then turned to fricatives and entirely deleted if they were non-labial: indeed Latin EQUA became IVE in Old French (with only the labial part of the labiovelar consonant surviving), a word which survived dialectally until recently.

    And looking at my original point on Romance-Lithuanian direct comparison being likely to mislead scholars into believing that the original word meant "mare": actually, let us imagine an alternate universe in which ALL the Indo-European languages of Europe are known solely through their modern forms: Lithuanian, Romance and Gaelic are as far as I know the only Indo-European languages in Europe today which have preserved a reflex of *EKWOS. Since two of them (Lithuanian and Romance) preserve solely a feminine form (*EKWA) and only Gaelic has a masculine form, an Indo-Europeanist operating solely with Modern data would find it more economical to assume a meaning "mare" which shifted once to "horse" (in Gaelic) rather than a meaning "horse" which twice shifted (in two geographically non-contiguous and otherwise not closely related branches of Indo-European, NOTA BENE!) to "mare". Only our knowledge of older Indo-European languages allows us to know that the Lithuanian and Romance change from "horse" to "mare" is indeed coincidental.

  21. Merri said,

    January 13, 2009 @ 11:29 am

    This wonderful demonstration of the common origin of words for 'horse' doesn't prove that early Indo-Europeans used horses.
    Only that they had a word for them.
    Nobody can prove this word wasn't borrowed.
    English and French share the word 'iglu'.

    And that's the limit of the reasoning.

  22. Bryn LaFollette said,

    January 13, 2009 @ 6:06 pm


    You're correct in that the existence of a reconstruction for horse doesn't "prove" that use of horses by the Proto-language speech community in-of-itself. However, this same thing can be said of Modern English and Computers. The presence of the word doesn't prove that anyone in the English World actually uses computers. However, it's going to be immediately obvious with only the most cursory inerdisciplinary examination of the facts would overwhelmingly support the presence of actual computers in contexts that would lead to the very likely conclusion that speakers of English (in many contexts) made use of computers.

    There's plenty of archaeological evidence to support the presence of horse-using culture in the same contexts as all the known IE cultures and pretty much all of sites suspected of being IE sites.

    Words don't typically get adopted into a language unless they are needed for something that is getting used regularly. And then, if the referent of the term isn't finding regular use in the community, there's every reason to assume it will fall out of use or be replaced in daughter languages by outside (of that daughter's dialect community) borrowing. We don't see a whole lot of this in the IE daughter languages, many of which demonstrate that the modern reflex of the word for 'horse' has been around since PIE times. Before this point, how knows. Professor Ringe makes the very apt point that the (unanalyzable) reconstruction for 'horse' could very well have been borrowed into the PIE speech community from some other language, or it may have been borrowed by into Proto-Anatolian and from there, on to the rest of the contemporary post-PIE speech communities (presumedly before the break with Tocharian). But, historically, as he clearly states, this is a moot point, as by that point you've already established the unity (or close unity) of the IE Language family from a very early point, rather than a later borrowing of a single term throughout an already fragmented and differentiated IE family of languages.


    Professor Ringe never implies that the Anatolian word for 'wheel' wasn't cognate. Specifically, he says that is was never written out phonetically, but rather only written using a Sumerian cuneiform logograph. This means we don't know how they pronounced the symbol used to represent the word 'horse' in text. However, given that the daughter languages do have reflexes of the proto-word, it's highly likely that the pronunciation was in fact cognate with the PIE term reconstructed.

  23. Marconatrix said,

    January 14, 2009 @ 2:35 am

    As a point of information. The reflex of *ekwos turns up in Welsh in at least two compounds. _Ebol_ 'colt' and the place name _Eppynt_ < eb+hynt ep- (in p-Celtic, attested in Gaulish) > eb- in Welsh with intervocal lenition. There is no connection with Welsh -ap- which is a worn-down form of _mab_ 'son', Gaulish _mapos_ < Common Celtic _*makwos_ cf. Ogam Irish MAQQI (genitive case = 'son of'). The normal word for 'horse' in Scots Gaelic (descended from Old Irish) is _each_.

    I'm afraid some people still think Celtic = Mysterious and will try to use it to prove anything they like, and as these languages aren't as widely studied as the other major IE families they very often get away with it.

  24. Marconatrix said,

    January 14, 2009 @ 3:15 am

    Part of the above post seems to have vanished. Just that _eppynt_ means 'horse-track' from _eb-hynt_ from _*epo-sentos_.

