## The linguistic history of horses, gods, and wheeled vehicles

This started with Don Ringe's guest post "The Linguistic Diversity of Aboriginal Europe". He followed up with a more detailed account of "Horse and wheel in the early history of Indo-European", and an answer to some questions under the title "More on IE wheels and horses", and then this morning's post "Inheritance versus lexical borrowing: a case with decisive sound-change evidence".

Readers have added a large number of interesting and provocative comments and questions (110 on the original post alone). As usual, responses are often too long to fit comfortably in the comment format, and our traditional practice has been  to respond in follow-up posts where interest and time permit.

Continuing that tradition, I've posted below Don's response to a comment by Etienne on Don's follow-up post on the history of the word for horse. Though the background is complex, this fragment of the conversation is quite coherent on its own.

Here's Etienne:

There is a core assumption in your posts: namely that the proto-Indo-European form in fact had the meaning "horse". But let us imagine that the original meaning was less definite, perhaps "large quadruped" or the like (the meaning "donkey" of the Armenian reflex is worthy of notice in this context). If we imagine a spread of Proto-Indo-European that took place before the domestication of the horse, it is more than plausible that the subsequent spread of domestic horses would lead to the inherited Indo-European word (*whatever its phonological form had in the meantime become in various Indo-European-speaking communities*) everywhere undergoing a process of semantic narrowing and becoming the word for "horse".

Here's a partial analogy: all Germanic languages today have a cognate of English "God" to refer to the Christian god. The original meaning of the proto-Germanic word was a non-Christian god, obviously: but if we had no knowledge of the chronology of the spread of Christianity compared to the chronology of the break-up of Proto-Germanic, we would have no way of knowing whether the proto-Germanic word (however accurately we are able to reconstruct it as far as phonology goes) referred to the Christian god or not. In like fashion, I accept the reconstruction of the phonological form of the Proto-Indo-European form of the word which in attested Indo-european languages meant "horse", but am less certain as to its original meaning in the proto-language.

Don's response:

Etienne’s point is well taken.  As he undoubtedly knows, the irregularity of semantic change doesn’t allow us to reconstruct meanings with the same precision with which we can reconstruct forms.  I don’t know of any positive evidence for the scenario he sketches, and a (near-)unanimous shift in so many subfamilies might be a long shot, but the point is that it can’t be excluded completely.

(The shift in the meaning of god seems less extreme to me, partly because there was an institution—the Christian church—exerting pressure in that direction for centuries, and partly because there was already a tradition of coopting the Greek and Latin polytheistic words for the monotheistic religion.  The latter would have been obvious for a long time, because in parts of the western half of the (former) Empire Greco-Roman polytheism hung on till at least the late 6th c. CE; see Richard Fletcher’s book The bar­barian conversion for some eye-opening discussion.)

Also, regardless of what kind(s) of equids the PIE ‘horse’-word meant, it (or its ancestor) must surely have been applied to wild equids first; so the existence of such a word in the protolanguage really doesn’t provide evidence for the domestication of horses.

So far as I can see, the hard linguistic evidence for a relatively “shallow” date for PIE is the wheeled vehicle terminology.  It’s true that Anatolian provides no good evidence for ‘wheel’ itself, but Hittite does have two inherited words for parts of the traction apparatus.  The most striking is ḫissas ‘thill’ (the pole that attaches the wagon to the harness), cognate with Sanskrit īṣā́ ‘thill’.  The PIE word is reconstructable as *h2iHsó- or *h3iHsó- (where “H” is any of the three “laryngeal” consonants); I’d hazard a guess that it was feminine, like the Skt. word (thus nom. sg. *h2/3iHsé-h2, etc.), since Hittite lost the feminine gender and can be expected to have remodelled the endings.  The meaning is very specific and the word is not derivable from any verb root, so this is about as good as palaeolinguistic evidence gets.

But the other bit of evidence is reasonably good as well.  Hittite iukan ‘yoke’ is clearly cognate with Skt. yugám, Gk. ζυγόν /sdugón/, Lat. iugum, Old English ġeoc, etc.  The noun is derived from *yewg- ‘join’, but it never means just any means of joining, always specifically ‘yoke’.  Moreover, its formation is unusual (though not as odd as that of ‘wheel’):  though neuter o-stem nouns with zero-grade roots do appear in various daughter languages, very few are shared by a large number of sub­groups; so this is another derived noun that was probably derived only once.  Now, if you really want to you can argue that these items could have been used with sleds (though in that case the PIE “homeland” should have had snowy winters; the steppes would do, but central Anatolia probably wouldn’t).  But I think the natural interpretation is that these are wheeled-vehicle terms.

I think most of the evidence that can be brought to bear on the problem of the IE dispersal is actually archaeological.  We need to find the best possible fit between the archaeological data, which are sometimes extensive, and relevant linguistic data, which are usually restricted.  It usually turns out that you can fit quite a few different archaeo­logical scenarios to the linguistic data, but not all equally easily; and the more stretching and squeezing you have to do to make it all fit, the less likely the result is to be correct.  (It has to be remembered that extrapolating into prehistory is always a probabi­listic en­deavor.  Smoking guns are very rare.)  I’m prepared to accept David Anthony’s hypothesis for the time being because I think it’s the most likely so far, and because he’s backed it up with a great deal of detailed archaeological evidence; I’ve never seen any­thing like that from Colin Renfrew, for instance.  (Not that I’m rejecting all of Renfrew’s work—I think his work on pre-Celtic Britain is very impressive; I just think Anthony’s done better with the IE dispersal problem.)  But I don’t care deeply whether Anthony turns out to be right in the long run, or about any specific conclusions; it’s the data and the methodology that matter.

[Above is by Don Ringe]

1. ### jfruh said,

January 13, 2009 @ 12:00 pm

On the note of the Germanic God words above (and again, keeping mind that I'm a total amateur): Could the smooth transition have been helped by the fact that the God words may have been cognates anyway? In other words, a German speaker would have an easier time shifting the meaning of "Gott" in his head if he was getting his religioning from English-speakers who had done the same to "God."

In a larger sense, I wonder in the days before widespread literacy and language standardization the extent to which speakers of, say, Old English and Old Norse and early German would have eve recognized each other as speaking different languages, rather than just "Boy, he talks funny … ends 'God' with a t sound."

2. ### Nigel Greenwood said,

January 13, 2009 @ 12:07 pm

Just on a (minor?) point of detail: [T]he PIE “homeland” should have had snowy winters; the steppes would do, but central Anatolia probably wouldn’t. Check out the weather charts for Sivas or Malatya. I know from personal experience central/eastern Anatolia can be extraordinarily cold: there's certainly no shortage of snow!

3. ### Jim said,

January 13, 2009 @ 5:01 pm

This reminds me of the controversy in Uto-Aztecan over grain words – did they refer to wild grasses and thus mena that arizona-New Mexico was the homeland, are do they refer to maize growing and thus mean….. and how to interpret that words that may be cognate, or no is it loanwards, in Kiowa-Tanoan? Round and round they go and no one will ever know, except for the Aztlan bigots who are very clear on what answer they want.

4. ### Stephanie said,

January 13, 2009 @ 5:44 pm

I'm a zooarchaeologist, not a linguist, but these ideas are very interesting to try to fit into my research on early animal domestication in western Asia and Anatolia. I got confused by one assertion in the original post (&am commenting here for more visibility)!

"For instance, the fact that a word for ‘horse’ is solidly reconstructable for PIE (with reflexes in all the earliest-attested branches of the family, including Anatolian) rules out Mesopotamia, Anatolia, and any forested part of Europe as the area where PIE was spoken"

Why are these areas ruled out? Equids were part of life at Catalhoyuk from 10,000 BCE, where they were hunted for food. They weren't domesticated for riding, and probably looked like modern przewalski's horses, but they were still identifiably a unique kind of animal and would have been named.

thanks, Stephanie.

5. ### Stephanie said,

January 13, 2009 @ 5:50 pm

[I also second the comment on central Anatolian winters – they are VERY cold and we have seen heavy, persistent snows recently. Additionally, early Holocene Anatolia would have been cooler and wetter than it is today, so we shouldn't rule out long cold snowy winters in prehistory].

6. ### Nigel Greenwood said,

January 13, 2009 @ 6:14 pm

Re Anatolian winters. In fact the Turks have a word for these extreme conditions: karakış (literally "black winter" — the "black" indicating that everything living freezes up).

7. ### dr pepper said,

January 13, 2009 @ 6:16 pm

Hmm. Wouldn't a tribal incursion into another tribal area produce a patchwork of pidgins? And if so wouldn't it be possible for there to be multiple borrowings for different purposes within this framework, resulting in multiple meanings for loanwords right from the begining?

8. ### Don Ringe said,

January 13, 2009 @ 10:14 pm

Stephanie: Equids were part of life at Catalhoyuk from 10,000 BCE, where they were hunted for food. They weren't domesticated for riding, and probably looked like modern przewalski's horses, but they were still identifiably a unique kind of animal and would have been named.

Maybe my archaeological information is simply inadequate. How long did horses (of any kind) persist in that part of Anatolia? I’m asking because 10,000 BCE is clearly too early for PIE, but 5,000 BCE might not be. If there were enough horses in an area to make them economically important (e.g. as food) significantly later than that, then that area has to at least be part of the discussion about where PIE might have been spoken, so far as I can see.

9. ### Nigel Greenwood said,

January 14, 2009 @ 7:05 am

Further confirmation of conditions in (admittedly far northeast) Anatolian winters are given by the title & plot of Orhan Pamuk's novel Snow.

10. ### blahedo said,

January 14, 2009 @ 1:22 pm

@jfruh: Of course, languages/dialects are always continua, and so there would be some range where people would just say "boy, he talks funny". At a somewhat larger distance, it would become more like, "I only understand bits and pieces of what he's saying, maybe I can convince him to talk slower and use simpler words," before eventually devolving to "I'm not getting any of this, I'll just point at what I want and hope for the best." For a good illustration of the middle-range, look around the net for video of people speaking in some of the old country dialects of England. For me at least, many of them give a first impression of being another language entirely, though with a bit of concentration and replaying I can often get most of it. Part of it is that I'm American, but my understanding is that even for people born and raised in Britain speaking English, there are some dialects around the island that generate a similar reaction. Language contact of that sort is less common in the modern world, but strikes me as representative of the sort of interaction you seem to be talking about.

11. ### Etienne said,

January 14, 2009 @ 5:37 pm

Professor Ringe: many thanks for your (as usual, clear and detailed) reply; I expressed myself poorly: I didn't mean to imply that the shift in meaning from "large quadruped" to "horse" took place independently in each branch of Indo-European (a scenario which is indeed unlikely in the extreme): instead, I imagined a case whereby the semantic shift from "large quadruped" to "horse" would have taken place in some branch of Indo-European and thence spread to the other branches, without the actual reflexes of the original Indo-European word being borrowed: I had in mind Bloomfield's example of all Algonquian languages having a compound of the words for "fire" and "water" (each inherited from Proto-Algonquian) with the meaning "whiskey", the pan-Algonquian meaning of this compound being due to diffusion instead of inheritance or coincidence, of course.

Jfruh: I ran into a diachronic sociolinguistic study of Old English which pointed out that interpreters are only mentioned in the context of Old English to French, Latin, Gaelic or Welsh communication, and never in the case of Old English to Old Norse communication, leading the author to conclude that the two languages were mutually intelligible at the time and probably not even perceived as separate languages (I *may*be able to dig up the actual reference, if anybody's interested).

12. ### David Marjanović said,

January 14, 2009 @ 8:48 pm

Hittite lost the feminine gender

That comes as a surprise. My sources may well be outdated and certainly are too sparse to give a reliable indication of the state of the art, but I thought it was now mainstream that PIE had an animate-inanimate distinction which Hittite simply kept, and that the masculine-feminine distinction (inanimate becoming neuter in the process) is an innovation of the non-Anatolian branch?

13. ### marie-lucie said,

January 14, 2009 @ 9:54 pm

I find it difficult to imagine a people living in a region frequented by herds of wild animals having only a single word for "large quadruped", which would include, say "aurochs" and "elk" as well as "horse". Surely those animals are sufficiently different from each other in appearance, habits (including potential danger to humans) and the uses to which they could be put long before domestication (eg hunted or not, useful or not for food, hides or other by-products, etc) that people living in the vicinity of those species would have a name for each, and for various stages of their lives (eg at least young and adult, and males and females). For a modern example one could look at the vocabulary of people living under such conditions, for instance hunters on the African plains or in the Canadian North.

14. ### marie-lucie said,

January 14, 2009 @ 11:14 pm

… interpreters are only mentioned in the context of Old English to French, Latin, Gaelic or Welsh communication, and never in the case of Old English to Old Norse communication, leading the author to conclude that the two languages were mutually intelligible at the time and probably not even perceived as separate languages…

This makes a lot of sense to me as a historical linguist. The two types of speech were probably thought of as two different dialects, intelligible with a little effort, so that communication within intermarried families would not have been too difficult, each spouse using their own native speech with some borrowings mixed in (added to the factor of a similar, Germanic-type culture). On top of a lot of vocabulary, some of which almost but not quite duplicated what OF already had (eg skirt/shirt, etc), OE borrowed an Old Norse personal pronoun and a third person singular verb marker, but not other forms of a similar nature: if the languages had been quite different, it would have made more sense to borrow an entire set of such forms (this type of borrowing is not unheard of, but still quite rare). In this case the borrowed forms were more salient phonologically (= easier to distinguish by ear) than the inherited forms: for the 3rd person singular, OE borrowed the strongly hissing fricative s of ON as an equivalent to the barely audible th of OE, and the initial th of ON they/them made the 3rd person plural pronouns more distinctive than the h common to both singular and plural OE pronouns: compare Standard English to him/to them to dialectal to him/to hem.

15. ### David Marjanović said,

January 15, 2009 @ 7:09 am

I find it difficult to imagine a people living in a region frequented by herds of wild animals having only a single word for "large quadruped", which would include, say "aurochs" and "elk" as well as "horse".

So do I, but a word that means "donkey" and is shifted to "horse" when horses are introduced wouldn't surprise me at all.

After all, that's what happened to the cuneiform script: the Sumerians had no horses and therefore no logogram for them, so, as Prof. Ringe has informed us, the Hittites ended up writing "horse" with the characters for "mountain donkey".

Similar things happen all the time — the Basque word for "maize" once meant "millet", and the modern word for "millet" is a diminutive of that.

16. ### Merri said,

January 15, 2009 @ 8:16 am

> David : I thought it was now mainstream that PIE had an animate-inanimate distinction which Hittite simply kept, and that the masculine-feminine distinction (inanimate becoming neuter in the process) is an innovation of the non-Anatolian branch? <

Indeed. The marker -eH2 for animate became a feminine marker in non-Anatolian languages. Although it isn't a certainty, it's easier to imagine this happened only once, not too long after Anatolian languages disconnected from the main branch. Notice that Sanskrit doesn't have any feminine marker that would be a reflex from
-eH2, so the datation of this change could be refined.

The animate-inanimate distinction in Hittite is so strong that they needed an innovation to allow inanimate nouns to be subject of transitive verbs. This is called ergative by some grammarians, but is rather a derivation. It is possible that this 'no inanimate active subject' was present in proto-IE.

17. ### Trond Engen said,

January 15, 2009 @ 8:44 am

Etienne:

I ran into a diachronic sociolinguistic study of Old English which pointed out that interpreters are only mentioned in the context of Old English to French, Latin, Gaelic or Welsh communication, and never in the case of Old English to Old Norse communication, leading the author to conclude that the two languages were mutually intelligible at the time and probably not even perceived as separate languages (I *may*be able to dig up the actual reference, if anybody's interested).

Marie-Lucie:

This makes a lot of sense to me as a historical linguist. The two types of speech were probably thought of as two different dialects, intelligible with a little effort, so that communication within intermarried families would not have been too difficult, each spouse using their own native speech with some borrowings mixed in (added to the factor of a similar, Germanic-type culture).

Pétur Knútsson at Háskóli Íslands explores the Germanic dialect continuum, and especially the relation between Old English and Old Norse, in this paper. I don't know if it's the one Etienne meant.

I like the idea of languages acting like viscous liquids, with a global, maybe constant and uniform, tendency for divergence and a force (or several competing forces) of convergence holding them together. I dream of a computer simulation of such a system on a map, but the truth is that I'm not up to producing it, neither as a linguist, a geographer, a physicist or a programmer. But then again, I'm neither.

(And of course, but to meet a predictable distraction, I do not see continuum as inherently incompatible with the traditional tree structure of Historical Linguistics. They are different levels of abstraction.)

18. ### Stephanie said,

January 15, 2009 @ 6:08 pm

>> How long did horses (of any kind) persist in that part of Anatolia? I’m asking because 10,000 BCE is clearly too early for PIE, but 5,000 BCE might not be.

I'll have a look through the zooarchaeological literature and find out for us.

19. ### David Marjanović said,

January 15, 2009 @ 6:58 pm

Notice that Sanskrit doesn't have any feminine marker that would be a reflex from -eH2, so the datation of this change could be refined.

Unlikely because of the highly nested phylogenetic position of Sanskrit. I haven't looked for Nakhleh et al. (2005) yet, but I'd be surprised if its results were very different from those that Rexová et al. (2003) got in their proof-of-concept paper that used only basic vocabulary and no grammar, and only the best-documented languages:

——+——Hittite
………——+——Armenian
………………——+——Greek
…………………………——+——+——Celtic
……………………………………|……——+——Albanian
……………………………………|………………——+——Germanic
……………………………………|………………………——Romance
……………………………………——+——+——Indic
……………………………………………|………——Iranian
……………………………………………——+——Baltic
………………………………………………………——Slavic

That would mean that Sanskrit has secondarily lost it.

(I haven't checked if "Romance" includes Latin, but it probably does.)

Kateřina Rexová, Daniel Frynta & Jan Zrzavý (2003): Cladistic analysis of languages: Indo-European classification based on lexicostatistical data, Cladistics 19:120–127

And of course, but to meet a predictable distraction, I do not see continuum as inherently incompatible with the traditional tree structure of Historical Linguistics. They are different levels of abstraction.)

Or, as biologists would say, cladogenesis and speciation-under-the-"Biological Species Concept" aren't the same thing.

20. ### marie-lucie said,

January 15, 2009 @ 10:10 pm

I have not read the references mentioned by David Marjanović, but I am wary of conclusions drawn exclusively on lexical and lexicostatistical data, without consideration of morphology. Of course lexical data are by their nature and number easier to submit to statistical analysis, but reliance on such analysis also makes it easy to ignore morphological comparison.

21. ### David Marjanović said,

January 16, 2009 @ 6:58 am

That was deliberate; it's a proof-of-concept paper that shows that, although nobody doubts that grammar helps, lexical data contain sufficient phylogenetic signal for cladistic analysis. I'll send you the pdf.

22. ### David Marjanović said,

January 16, 2009 @ 6:59 am

(The point being that cladistic analysis would work on isolating languages, and on datasets that contain languages with vastly different morphology.)

23. ### Merri said,

January 16, 2009 @ 10:17 am

But neither is the case here, to be sure ?

ISTM that this algorithmic classification, which has indeed worked well in other cases where later interactions are fewer, doesn't work well here.
For example, the remote position of Armenian is due to borrowings, not to evolving, and borrowings don't take that much time.

Also, there is that classical implication of regular change rate, which is all but proven.

Unless you consider the k/s alternance as essential (and that's wrong, as it might be a result of paralle evolution), there is in fact little to support the neighbouring of Indo-Iranian with Balto-Slavic.
Phonemic considerations, in particular, don't agree with this.

24. ### marie-lucie said,

January 16, 2009 @ 11:47 am

… cladistic analysis would work on isolating languages, and on datasets that contain languages with vastly different morphology.

A lot of common vocabulary coupled with vastly different morphology suggests swamping of a language by another, without destroying the original language, under circumstances of political and/or cultural domination: English with its heavy overlay of Old French and Latin-based vocabulary is the obvious case. I think that Armenian and Persian are other well-known ones. The proof of Indo-European connection is in the basic morphology rather than in a statistical analysis of vocabulary.

25. ### Brian M. Scott said,

January 16, 2009 @ 2:04 pm

My sources may well be outdated and certainly are too sparse to give a reliable indication of the state of the art, but I thought it was now mainstream that PIE had an animate-inanimate distinction which Hittite simply kept, and that the masculine-feminine distinction (inanimate becoming neuter in the process) is an innovation of the non-Anatolian branch?

See the discussion in Section 4.4 of James Clackson, Indo-European Linguistics: An Introduction, 2007, in the Cambridge red series. Briefly, there is now some evidence that can reasonably be interpreted as supporting an original masc./fem./neut. gender distinction in Anatolian, but it's not conclusive, and the interpretation remains contentious.

26. ### David Marjanović said,

January 16, 2009 @ 8:38 pm

Erm… there are tons of theoretical literature about what cladistics is and what it can and cannot do, spanning three to five decades… it's just all in biology journals that most of you have never seen.

ISTM

What does this mean? "I should, though, mention" perhaps?

that this algorithmic classification,

It's not a classification. It's phylogenetics = the production and testing of a phylogenetic hypothesis.

which has indeed worked well in other cases where later interactions are fewer, doesn't work well here.

Then why is there so ridiculously little homoplasy in the data? Despite the tree length of 4294 steps, the consistency index is 0.89! If I got a manuscript on biological data to review and there was a phylogenetic tree in there with a CI of 0.89, I'd be certain that the authors must have cherry-picked the data to remove practically every character that contradicts their pet hypothesis! I'd scream bloody murder and cry for rejection of the manuscript! Decent-sized cladograms in biology (say, 100 taxa and 400 morphological = non-molecular characters) usually have a CI around 0.3 or even lower.

A CI of 1 means that there's no homoplasy (convergence, reversals, borrowing) whatsoever in the character; 0.5 means that each character changes on average twice as often as the theoretical minimum determined by its number of states.

I'm sure that cladistics could be applied, with reliable results, to questions like the existence of Nostratic, if not even larger putative clades.

For example, the remote position of Armenian is due to borrowings, not to evolving, and borrowings don't take that much time.

That's entirely possible — after all, morphology and phonology are not in the dataset –, but if you're right, why aren't Iranian and Indic down there, too? (In fact, they are if Rexová et al. repeat the most glaring mistake of Gray & Atkinson and code the presence or absence of a reflex of each cognate class as a separate character.)

Also, there is that classical implication of regular change rate, which is all but proven.

Unlike maximum likelihood, parsimony does not make any assumption about the speed of evolution of any character. This is why parsimony is, as shown by simulation studies, sometimes superior to max. likelihood, even though it's more prone to the (in molecular phylogenetics) serious problem of long-branch attraction.

Unless you consider the k/s alternance as essential (and that's wrong, as it might be a result of paralle evolution)

Absolutely anything might be a result of convergence, so nothing is essential. This fact — that there are no absolutely reliable characters and therefore a total-evidence approach is needed — is the fundamental insight at the base of cladistics.

Also, phonology wasn't even in the dataset. It's a proof-of-concept paper: can basic vocabulary alone yield a realistic topology? Answer: yes, with an insanely high CI.

Phonemic considerations, in particular, don't agree with this.

This does interest me, however. Could you be more specific?

A lot of common vocabulary coupled with vastly different morphology suggests swamping of a language by another

Yes, that's not what I mean — I meant families like Sino-Tibetan, where some languages are isolating, others agglutinating, others inflecting, and others polysynthetic.

The proof of Indo-European connection is in the basic morphology rather than in a statistical analysis of vocabulary.

3) The dataset of Rexová et al. (2003) does not include any non-IE language, because hardly any hypothesis even exists on what the closest known relative of IE as a whole might be. So, the monophyly of IE was not tested. The tree is rooted on the (reasonable) assumption that the Anatolian languages (of which only Hittite was included) are the sister-group to all of the rest together. The question was: can lexical data alone yield a topology that's reasonably close to what morphology + phonology + lexical data suggest (in the well-studied case of IE), or not?
2) Yes, the morphology is very useful. It helps a lot. It is not, however, strictly necessary: lexical data alone contain a great deal of phylogenetic signal, too. Means, phylogenetic analysis of isolating languages is possible — even if the results won't be as reliable as with heavily inflecting or polysynthetic languages.
1) Pet peeve alert: of all scientists, only historical linguists ever use the word "proof". That's because science cannot prove, only disprove. Basic science theory. Whenever you see someone complaining about an "unproven theory", you know you're dealing with a creationist — or a historical linguist…

See the discussion in Section 4.4 of […]

I see, thanks!

27. ### marie-lucie said,

January 16, 2009 @ 9:08 pm

David, OK, I used "proof" in a loose sense, so let us say "very high probability". A linguist I know uses the judicial metaphor: "evidence enough to charge, but not enough evidence to convict", since the latter requires "beyond a reasonable doubt". Of course as scientists we have to consider everything as a hypothesis and always be open to the possibility that we could be wrong, but that should not prevent us from considering it a strong point in favour of a hypothesis when different kinds of evidence all point to its validity in view of the available data. I only object to considering just one type of evidence (specifically, lexical items) as valid and dismissing the others.

28. ### Marconatrix said,

January 16, 2009 @ 10:30 pm

I'd say the nearest thing to 'proof' in both linguistic and biological phylogeny is the ability to reconstruct credible intermediate forms, and explain the transitions in terms of normal linguistic / evolutionary processes. In Biology this means for instance that structures don't come into being without having a purpose just to be there for a later stage (the preadaptation problem), everything has to have survival value at the point where it's selected for. In linguistics it means for instance that phonological changes are plausible, that systems of contrasts and indeed the language as a whole falls within the limits of know possible languages etc.

29. ### Breviaria 01/17/09 « rogueclassicism said,

January 17, 2009 @ 12:31 pm

30. ### David Marjanović said,

January 18, 2009 @ 7:44 am

I only object to considering just one type of evidence (specifically, lexical items) as valid and dismissing the others.

Which, again, is absolutely not what Rexová et al. did. They merely showed that lexical data aren't useless either — they contain enough phylogenetic signal for reconstructing a realistic-looking tree of IE intrarelationships.

I'd say the nearest thing to 'proof' in both linguistic and biological phylogeny is the ability to reconstruct credible intermediate forms, and explain the transitions in terms of normal linguistic / evolutionary processes.

That's by far not enough, because there always several different possible explanations that fulfill these criteria.

In Biology this means for instance that structures don't come into being without having a purpose just to be there for a later stage (the preadaptation problem), everything has to have survival value at the point where it's selected for. In linguistics it means for instance that phonological changes are plausible, that systems of contrasts and indeed the language as a whole falls within the limits of know possible languages etc.

And in both cases, there are always several different ways to achieve this goal, and often several of them are even equally parsimonious.

Of course, such competing hypotheses make competing predictions, and those are in principle testable. For example, in biology, different hypotheses predict different fossils; when a fossil is then found that conforms to at least one hypothesis but contradicts at least one other, those that it contradicts are falsified. In linguistics, the "decipherment" of Hittite was such a case.

From a comment of my own:

I meant families like Sino-Tibetan, where some languages are isolating, others agglutinating, others inflecting, and others polysynthetic.

Or indeed IE, where modern English is almost isolating, the two Tocharian languages had mostly agglutinating noun morphology, and modern French has started to polysynthesize (t'veux conduire, ou t'veux qu'j'conduise ?)… though of course none of this is as extreme as what ST has to offer.

31. ### Ardagastus said,

January 22, 2009 @ 3:36 am

They merely showed that lexical data aren't useless either — they contain enough phylogenetic signal for reconstructing a realistic-looking tree of IE intrarelationships.

Depends on the reality, I guess. Some of the branchings are unexpected (at least, to me), and some also make me wonder what's the method behind it (on purely lexical grounds, a Romance language like Romanian has stronger ties with the Slavic group, yet that classification fails to note it).

32. ### David Marjanović said,

January 22, 2009 @ 5:45 pm

on purely lexical grounds, a Romance language like Romanian has stronger ties with the Slavic group, yet that classification fails to note it

That may be because only words on some Swadesh list or other were used. These are, AFAIK, mostly Romance in Romanian.

33. ### David Marjanović said,

January 22, 2009 @ 5:46 pm

34. ### Ardagastus said,

January 25, 2009 @ 5:54 pm

I found the abstract of that paper and most of my concerns are not about what this 7 pages study holds (however I'd like to read it, how can I give you my e-mail address without making it public in this thread?).

Their phylogenetic tree relies on the lexicostatistical dataset collected by Isidore Dyen (a version of it can be consulted here), which, as you already pointed out, it is about Swadesh lists.

However I have reasons not to trust so much this dataset and most of the consequent conclusions.
It's beyond my ability and patience to make a full investigation, therefore I'm using few familiar examples from Romanian words listed for the first few meanings. Some cognates are not really cognates but recent loan-words from the other languages (Rom. "animal"), some meanings are not given all the words and thus cognates are missed (for "belly" the list gives Rom. "pîntec(e)" but not "vintre"). The bibliography provided in Isidore Dyen, Joseph B. Kruskal, Paul Black, Indoeuropean Classification: A Lexicostatistical Experiment (1992) is disappointing, as for Romanian the 200 words Swadesh list has a single source – an English-Romanian pocket dictionary: Şerban Andronescu, Dicţionar de buzunar Englez-Romîn, Bucharest, 1961. It's even more saddening considering that my corrections are not subtle, nor recent or controversial finds.
I appreciate Dyen et al. chose a "pseudo-map" for Romance languages instead of a tree representation, however I can only wonder why Ladin looks "more related" to Romanian than Italian (many Romance dialects are missing anyway, and any representation has good chances to get inaccurate even for this only reason). For some interesting perspectives click here.
However, even having the correct information, I couldn't find any persuasive argument that these lists represent anything but a mere selection of words. Perhaps arbitrary, perhaps chosen to illustrate something about some languages (but failing to make a similar point for others). Actually I don't think one can prove that a given selection of words is the universal tool of knowing how old some languages are or what are the relations between them.

A good read if you know Romanian is Marius Sala et. al, Vocabularul reprezentativ al limbilor romanice, Bucharest, 1988 (The representative vocabulary of the Romance languages, IIRC it was not translated), a 600+ pages book analysing the "representative vocabularies" of nine Romance languages, for most of them the analysed set containing 2300+ words. These "representative vocabularies" were built follwing two criteria for all nine languages: semantical richness and power of derivation. When available the word's frequency and dispersion were also taken in account.
Some important conclusions I can draw from this book are:
– the rate of lexical change is not constant
– the borrowing and the internal formation are important sources of words, even in a "representative" set of words
The inherited Latin element in those nine vocabularies varies from ~30% to ~50%, it also should be noted that the Romanic (from other Romance language or Medieval Latin) loans are substantial, that's why even a language with massive borrowings like Romanian looks "more Romanic" than we'd expect.

Consequently, when I'm reading something like R. D. Gray & Q. D. Atkinson's "Language-tree divergence times support the Anatolian theory of Indo-European origin", I cannot share their confidence. Unreliable input, unjustified method – I can't trust their conclusion.

35. ### John Croft said,

January 30, 2009 @ 5:50 pm

David, your PIE phylogenic tree is very different from the ones I have seen.

Firstly there seems to be in the linguistics a close connection between Celtic and Italic languages, with theories of possible intergradation. Similarly there is a close hypothesised connection between Proto-Greek and Proto-Armenian too. The centum-satem split that your theory proposes seems to have been superceded, with the centum group just being a group of early speakers. Also your tree does not include Tocharian A or B, which seems to have been the first to split after the separation of the Proto-Anatolian group.

36. ### David Marjanović said,

February 22, 2009 @ 10:46 am

Ouch. Ardagastus, are you still here? I didn't check back for a month…

I found the abstract of that paper and most of my concerns are not about what this 7 pages study holds (however I'd like to read it, how can I give you my e-mail address without making it public in this thread?).

Find me in Google Scholar and write to me.

Actually I don't think one can prove that a given selection of words is the universal tool of knowing how old some languages are or what are the relations between them.

The point of the paper was something different: to show that lexical data contain enough phylogenetic signal to be used in phylogenetic analyses.

the rate of lexical change is not constant

Of course not. Just to repeat myself: Phylogenetic analysis using simple parsimony does not make any assumptions about the speed of evolution of any character.

Consequently, when I'm reading something like R. D. Gray & Q. D. Atkinson's "Language-tree divergence times support the Anatolian theory of Indo-European origin", I cannot share their confidence.

I agree, but for completely different reasons: they misused the method. They coded the presence/absence of each cognate set as a separate character, instead of coding all cognate sets with the same meaning as a single multistate character. This grossly distorts both the shape of the tree and its length — as Rexová et al. (2003) demonstrate, BTW.

Firstly there seems to be in the linguistics a close connection between Celtic and Italic languages, with theories of possible intergradation.

Yes, it is a surprise that Rexová et al. didn't find that. However, I can easily come up with possible reasons. First, because it's a mere proof-of-concept paper, they restricted themselves to the best-documented languages — so they included Latin and its descendants, but not any other Italic languages. Taxon sampling can have a large influence on the outcome of a phylogenetic analysis. Second, again because it's a proof-of-concept paper, only lexical data were used. Most of the evidence for Italo-Celtic is grammatical, right? It's not in the dataset.

Similarly there is a close hypothesised connection between Proto-Greek and Proto-Armenian too.

Well, Rexová et al. almost found that. :-| They are only one node away from being sister-groups. What are the shared innovations of Greek and Armenian? Are any of them lexical?

The centum-satem split that your theory proposes seems to have been superceded, with the centum group just being a group of early speakers.

Well, apparently a restricted version of the centum-satem hypothesis is the most parsimonious explanation for the data, even if (as in this purely lexical dataset) the phonological evidence, which is where the centum-satem hypothesis historically comes from, is completely ignored. I don't see why it shouldn't be tentatively accepted.

Of course, phonologically it's strange that Albanian ended up nested inside the centum branch…

Also your tree does not include Tocharian A or B

Yes, I agree this is a shortcoming. But, again, it's merely a proof-of-concept paper: it's intended to show that lexical data alone are sufficient to produce a non-whacky phylogeny.