Ultraconserved words? Really??

« previous post | next post »

On the web site of the Proceedings of the National Academy of Sciences, in the "Early Edition" section, is an article by Mark Pagel, Quentin D. Atkinson, Andreea S. Calude, and Andrew Meade: "Ultraconserved words point to deep language ancestry across Eurasia". The authors claim that a set of 23 especially frequent words can be used to establish genetic relationships of languages that go way, way back — too far back for successful application of the standard historical linguistics methodology for establishing language families, the Comparative Method.  The idea is that, once you've determined that these 23 words are super-stable (because they're used so often), you don't need systematic sound/meaning correspondences at all; finding resemblances among these words across several language families is enough to prove that the languages are related, descended with modification from a single parent language (a.k.a. proto-language).

This is the latest of many attempts to get around the unfortunate fact that systematic sound/meaning correspondences in related languages decay so much over time that even if the words survive, they are unrecognizable as cognates (sets of words descended from the same word in the parent language).   This means that word sets that have similar meanings and also sound similar after 15,000 years are unlikely to share those similar sounds as the result of inheritance from a common ancestor; if they were really such ancient cognates, they would almost surely not look much alike at all. (See "Scrabble tips for time travelers", 2/26/2009, for a discussion of some earlier work.)

I'm not qualified to judge Pagel et al.'s statistics, although I remain skeptical of their basic claim that words that haven't been replaced often in a handful of language families with vastly different time depths can be predicted to be super-stable in all language families. But there are problems with their premises in this article, in which their goal is to compare words from seven different language families and to show that, according to their statistics, all seven should be grouped together into a single super-family. I think they have a serious garbage in, garbage out problem.

Pagel et al. used their statistical method to compare reconstructed words for the seven language families they identify: Altaic, Chukchi-Kamchatkan, Dravidian, Eskimo, Indo-European, Kartvelian, and Uralic. One problem is that Eskimo is not a language family; it's part of the Eskimo-Aleut language family, and any effort to find deeper genetic relationships for Eskimo that doesn't take Aleut data into account is not likely to be useful.

A more serious problem is that Altaic is at best highly controversial as a proposed language family. The hypothesized Altaic family comprises three well-established families — Turkic, Mongolian, and Tungus — plus Korean and Japanese. It's a very old idea, but efforts to provide convincing evidence that all these languages belong in a single Altaic family have failed to convince most specialists. A prominent recent exchange appeared in the journal Diachronica (2004, 2005), starting with Stefan Georg's devastating review of Sergei Starostin et al., Etymological dictionary of the Altaic languages, and continuing with Starostin's reply and Georg's reply to the reply. In his reply, Starostin commented plaintively that he had hoped `that the publication of more than 2000…Altaic etymologies would put an end' to the dispute about whether an Altaic language family exists. To this Georg responds, though not in these words, that 2000+ unconvincing etymologies do not add up to any convincing etymologies at all.

In his review, Georg criticizes Starostin et al. for erroneous reconstructions of words in the individual language families and for a very loose standard of semantic "matching". The latter may be the most common criticism of word comparisons in efforts to establish very distant linguistic relationships; the other major criticism is a very loose standard of phonetic "matching". Given enough semantic and phonetic latitude, it's possible to amass a large number of "matching" sets of words for any set of two or more randomly selected languages. (If you don't believe me, try it: take bilingual dictionaries and search for similar-looking words that have vague semantic connections. It's an easy exercise.)

So I went to the website from which Pagel et al. got their data, the Languages of the World Etymological Database, and checked their 23 words in the Altaic database, which is presumably derived from Starostin et al.'s three-volume etymological dictionary. Only two of the 23 words have a single "Proto-Altaic" etymon each in the database, `what' and `spit (verb)'. All the others (except perhaps `I', `we', and `ye', which I couldn't find due to problems with the search function) have 2-7 "Proto-Altaic" forms each, and at least nine of the words have five or six each. How did Pagel et al. decide which "Proto-Altaic" word to compare to their other six reconstructed proto-languages? They apparently examined all of the possible words for each translation, e.g. five "Proto-Altaic" words for `that', four for `hear', 5 for `flow', 4 for `hand', and so forth; they then chose just one proto-word for each meaning, namely, the one `that the LWED proposed as cognate between language families', and used that one for their statistical analyses. This is a puzzling procedure, for two reasons. First, the Altaic database (and the Indo-European database too, and perhaps others as well) often lists more than one proto-word as cognate with words in some of the other six proposed language families. Pagel et al. do not say how they decided which set of putative cognates to select. Second, while acknowledging that linguists often `propose more than one proto-word for a given meaning', they observe that these proposals `can reflect synonyms in the proto-language or, more likely, uncertainty as to which of the words used among a language family's extant languages are most likely to be cognate to the ancestral word.' But if they believe (erroneously!) that synonyms are unlikely in proto-languages, and that apparent synonyms probably reflect linguists' uncertainty, how can they be confident that any selection from one of several options for a given meaning for the proto-language is the genuine one and only word for that meaning in the proto-language? What does this indeterminacy do to their claim that words for certain meanings are super-stable, unlikely to be replaced over thousands of years? And doesn't it introduce an element of circularity into their statistical calculations when they choose the set of proto-words to be compared according to its putative match with other language families and not according to an independent criterion?

There are other serious problems too. Unlike Altaic, most of the other families in the LWED databases are genuine language families. But if the "Proto-Altaic" reconstructions are representative of the quality of the reconstructions for the established families, it would be rash to rely on them. This is in spite of the fact that some of the reconstruction databases (e.g. Indo-European and Dravidian) are based on standard etymological dictionaries. The "Altaic" database contains variables in numerous reconstructions — usually V for an unspecified vowel, but also optional and alternate consonants — that make phonetic "matching" even easier (and therefore less reliable). This is a feature of many reconstructions carried out by people engaging in long-range comparison of languages, including efforts to establish a Nostratic super-family. In at least some of the individual LWED databases, the reconstructions based on standard sources have been `revised and significantly modified' (quoting George Starostin, Dravidian database) by others, and those others are believers not only in Altaic but in the super-family Nostratic. Reconstructions carried out by true believers in Nostratic are all too likely to be influenced by knowledge of words with vaguely similar meanings and/or forms in other proposed Nostratic languages — namely, in the LWED databases, the seven families compared by Pagel et al.

I also checked Pagel et al.'s supposedly super-stable words in the LWED's Indo-European (IE) database. One notable fact is that, of these 23 words, English retains only 6 or 7, assuming that the LWED's database is accurate — a fact that might be expected to limit Pagel et al.'s confidence in the reliability of their 23 words as an indicator of genetic relatedness. The count for English depends in part on whether the IE database has accurate reconstructions — `spit' in particular is dubious, because this IE database disagrees with the Oxford English dictionary (OED) here and the sounds don't match well enough to be convincing. I haven't checked all of the relevant LWED etymologies, but it looks there's a reasonable Proto-Indo-European etymology for the English words give, man, mother, fire, flow, and worm, in their current meanings.

The IE database has a sizable number of eyebrow-raising etymologies; like the database for "Altaic", it does not inspire confidence, although there is of course no question about the relatedness of the IE languages. There are many variables in the reconstructions, and many the forms themselves often bear little resemblance to mainstream Indo-Europeanists' reconstructions. The semantic looseness is often extreme. For instance, the database glosses a reconstructed form *(a)den@gh- (where @ = schwa) as `to reach, to seize, to have time'. Among the proposed descendants of this form are a Tocharian B form meaning `rise, raise oneself up', an "Old Indian" (Sanskrit?!) form meaning `reach, strike', an "Old Greek" (Ancient Greek?!) form meaning `with the teeth, biting together', and an Old Irish form meaning `repress, oppress, suppress, crush, put down'. This is typical of the semantic latitude. Formally, too, there are problems. The proposed "Old Indian" descendant of this proto-word is given as daghnoti, possibly on the assumption that the nasal of the reconstructed root metathesized with the gh; but the nasal of the Sanskrit form is a present tense suffix, not part of the root at all. So Sanskrit (by whatever name) doesn't match the database's proto-word phonetically.

If the reconstructions used by Pagel et al. for their statistical analyses are not reliable in either form or meaning, then the statistical results of comparing these reconstructions cannot provide any evidence for distant relationships among the seven groups they compare. If the selection procedure for choosing among several candidate proto-words to use for the statistical analysis is flawed, then there may be problems with the statistics as well. But even if there are no statistical flaws, the Pagel et al. paper is yet another sad example of major scientific publications accepting and publishing articles on historical linguistics without bothering to ask any competent historical linguists to review the papers in advance.

There is a larger moral here too. Early in their paper, Pagel et al. report, correctly, that after 5,000-9,000 years, `most words are thought to suffer from too much semantic and phonetic erosion to allow secure identification of true cognates', in particular (though they don't emphasize this point) because of the decay and loss of `the sound and meaning correspondences…which are thought to indicate that they derive from common ancestral words.' The authors intend their statistical method to provide evidence for relatedness of languages that are beyond the reach of the Comparative Method. Like other long-rangers with dreams of discovering bigger and bigger family groupings — maybe even the ur-human language, what the late Joseph Greenberg called Proto-Sapiens — Pagel et al. believe that abandoning the one method that is known (not just "thought") to be reliable can achieve the goal. But you still can't make a silk purse out of a sow's ear.

Update — also see Asya Pereltsvaig and Martin Lewis, "Do 'Ultraconserved Words' Reveal Linguistic Macro-Families?", GeoCurrents 5/10/2013.


  1. Howard Oakley said,

    May 8, 2013 @ 12:40 pm

    Thank you for that illuminating linguistic assessment, which confirms my feelings of very deep disquiet (as a non-linguist) about this paper.
    I am very troubled at Figure 3, in which the lighter symbols represent 'raw' (i.e. real) data points, and the darker symbols are "smoothed data based on a running mean with a window width of 10." I fail to see any relationship between the raw data and the regression lines fitted – indeed the raw data appear largely randomly scattered. This would appear to be a very strange way of dealing with an integral variable such as class size!
    I see this work as pushing a very long plank out into the past. Already that plank is bending and creaking to try to reach the first levels of reconstruction, with the ancestors of today's languages. To that is added another even longer plank to reach back to the likes of PIE. This paper adds another two lengths of planking, entirely unsupported by any corroborative evidence or calibration against data.
    But I suppose it is attractive to those awarding research funding…

  2. dw said,

    May 8, 2013 @ 12:58 pm

    My heart sank when I saw the headlines: even more so when these findings were reported (mostly) as unqualified truth, without a single independent expert being called on for comment.

  3. Allen Browne said,

    May 8, 2013 @ 1:08 pm

    The Washington Post has this rather odd interpretation of what Pagel et. al said:

    "'You, hear me! Give this fire to that old man. Pull the black worm off the bark and give it to the mother. And no spitting in the ashes!'

    It’s an odd little speech. But if you went back 15,000 years and spoke these words to hunter-gatherers in Asia in any one of hundreds of modern languages, there is a chance they would understand at least some of what you were saying."

    I doubt that monolingual speakers of even languages closely related to English could understand any portion of that sentence.

  4. Ambarish said,

    May 8, 2013 @ 1:23 pm

    I saw this first on the linguistics subreddit; happily, the reaction there too was mostly negative. A nit-pick though: the nasal of the Sanskrit form is not a present tense suffix; 'ti' would be a the present tense suffix in 3rd person singular in the active. 'no' is the gunated form of 'nu' which marks verbs of the 5th class in conjugational tenses.

  5. Sally Thomason said,

    May 8, 2013 @ 1:24 pm

    @Howard — Actually, reconstructing PIE is perfectly feasible — historical linguists have had immense success doing it, over a period of ca. 150 years now. That's one of the many things that makes historical linguistics so much fun: carefully reaching long planks back into the past to develop knowledge of prehistoric linguistic states.

    But it's vital to do it right, with proper constraints and competent application of the Comparative Method. That's not what the LWED IE database reflects. Reconstructing the other real families in Pagel et al.'s list — at least Dravidian, Eskimo[-Aleut], Kartvelian, and Uralic (I don't know the state of things with Chukchi-Kamchatkan) — should also be feasible, and considerable progress has been made with some of them, notably Dravidian and Uralic. By well-trained historical linguists, that is. It may also be possible to compare solidly reconstructed PIE, Proto-Uralic, Proto-Dravidian, and the rest and find enough systematic correspondences to support a proposal of genetic relatedness for some or all of those families, using the gold-standard Comparative Method. But a successful effort would have to rest on good, solid, reliable reconstructions. The LWED database reconstructions just aren't good enough to form the basis of a successful effort along those lines.

  6. Richard Sproat said,

    May 8, 2013 @ 1:26 pm


    No surprises there: the popular science press likes juicy stories, and this is surely a juicy one. Wouldn't want sober reason to get in the way of that.

    To be fair — and recognizing that Sally's points are entirely valid — this is still less fantastic than another paper that Atkinson was involved in (also discussed in this forum), that claimed to find a signal in decreasing phonological diversity as humans migrated from Africa starting 70,000 years ago.

    –Richard Sproat

    [(myl) Indeed — see "Phonemic diversity decays 'out of Africa'?", 4/16/2011; "Phonemic SFE disconfirmed", 2/2/2012.]

  7. unrelatedwaffle said,

    May 8, 2013 @ 1:47 pm

    I'm pretty sure Quentin Atkinson (totally not a real name, come on) is just a huge troll put on this Earth to make proper linguists have aneurysms. Thanks for adding yet another "actually, that study is extremely flawed and unreliable" to my "cocktail party obligations" list.

  8. Sally Thomason said,

    May 8, 2013 @ 1:57 pm

    True, Richard, but the idea of a phoneme bottleneck at least had the advantage of being (unintentionally) fairly hilarious: Oops, leaving Africa now, gotta lighten the load and leave some of those phonemes behind!

  9. Richard Sproat said,

    May 8, 2013 @ 2:06 pm

    Very good :)

  10. Howard Oakley said,

    May 8, 2013 @ 2:11 pm

    Sally, I do not doubt that reconstructing PIE is perfectly feasible, and as an interested amateur I have read fascinating accounts of such reconstructions. However there is a serious problem over the science and credibility of using such reconstructions on which to base further reconstructions, no matter how rigorous the method used.
    Science proceeds using a cycle of hypothesis and testing. A reconstructed language is a hypothesis, unless someone can come along and discover archaeological evidence of inscriptions, tablets, etc., which can be used to test that hypothesis.
    The moment that you build hypothesis on top of untested hypothesis, you enter the realms of interesting fiction – and science is full of untested and untestable hypotheses.

  11. Nick Z said,

    May 8, 2013 @ 2:15 pm

    Further to the points about the IE etymological dictionary. The database used by LWED is based (with only very minor updatings) on 'Walde-Pokorny's dictionary' , which I take to be their Vergleichendes Woerterbuch der indogermanischen Sprachen, which was published between 1928-32 Hence the unidiomatic translations of things like altindisch –> Old Indian and altgriechisch –> Old Greek. Remarkably enough, Indo-European studies have moved on since 1932 and this work is extremely out of date.

    Notably, for example, it does not include (or only in the most rudimentary way) the set of consonants known as laryngeals. Essentially this means that (almost?) every word which in the database begins with a vowel in fact begins with a consonant, e.g. the root reconstructed as *ag'- 'drive' would now be reconstructed as *h2eg'-. This alone ought to make quite a difference to the putative cognate groups proposed by the creators of LWED.

    I would stress that, given the lack of an complete and up-to-date etymological dictionary of PIE (although there are excellent dictionaries for the verb and some nouns from the 21st century), works like Walde-Pokorny are still used by Indo-Europeanists : but only as a data-dump, to be sifted through very carefully, and with recourse to the latest resources. If this dictionary is truly the basis for the Indo-European cognates used by Pagel etc., this is most definitely a case of garbage in…

  12. Nick Z said,

    May 8, 2013 @ 2:20 pm

    By the way, thanks very much, Sally, for such a rapid and articulate post on the many problems with this paper.

  13. Martin Kümmel said,

    May 8, 2013 @ 2:58 pm

    Thanks for this excellent comment to the paper.

    Indeed, a reconstruction like *(a)denəgwh- is clearly from Pokorny's dictionary, and this one in particular cannot be upheld.
    For Sanskrit "daghnoti" you don't need a metathesis of *ngh, since "dagh-" might theoretically contain a hidden nasal (a < syllabic n). However, this very Sanbskrit root clearly does not contain a nasal and certainly has nothing to do with Tocharian "tsank-" 'to rise'.

    PS. What do I hgave to do to produce italics here?

    [(myl) Place the text to be italicized in between
    <i> and </i> — standard html markup.]

  14. Victor Mair said,

    May 8, 2013 @ 3:03 pm

    Thank you, Sally, for writing on this. I can't tell you how many people have sent me the link to the Washington Post article. Now I have something to send back to them.

  15. Jeff Carney said,

    May 8, 2013 @ 3:27 pm

    <i>italics made easy</i>

  16. Belial said,

    May 8, 2013 @ 3:33 pm

    Try telling that to a caveman!

  17. Matt Juge said,

    May 8, 2013 @ 3:39 pm

    Thanks, Sally. I've been looking forward to a post here on LL about this "news". Now if only we could figure out how to get people to pay attention to the real news and have enough knowledge to make them skeptical.

  18. Greg Morrow said,

    May 8, 2013 @ 5:12 pm

    Is there a robust compiled up-to-date modern PIE dictionary, either online or in book form?

  19. maidhc said,

    May 8, 2013 @ 5:24 pm

    "You, hear me! Give this fire to that old man. Pull the black worm off the bark and give it to the mother. And no spitting in the ashes!"

    Google Translate results:
    "Tú me escuchas! Tome este fuego a ese viejo. Tire del gusano negro de la corteza y darle a la madre. Y no escupir en las cenizas"
    "Sie, hören Sie mir! Geben Sie dieses Feuer in diesem alten Mannes. Ziehen Sie den schwarzen Wurm aus der Rinde und geben Sie es an die Mutter. Und kein Spucken in der Asche"
    "Ty mnie słyszysz! Daj ten ogień do tego starego człowieka. Wyciągnij czarny robak off kory i dać go do matki. I nie pluj w popiele"

    These are supposed to be mutually comprehensible? Those people 15000 years ago must have been much smarter than people are today.

  20. Stefan Georg said,

    May 8, 2013 @ 5:37 pm

    Thanks to Sally Thomason for this brilliant assessment, every single word of which I subscribe to. It's saddening in the extreme that such stuff keeps attracting research funding. I learned about the Pagel et al. paper yesterday through e-mail, and this morning I found (of course incompetent, of course enthusiatic) coverage on my preferred news portal. But the day ends well, with the reading of this post, thanks again.

  21. Belial said,

    May 8, 2013 @ 5:43 pm

    A diet of worms is good for the intellect, apparently.

  22. Tom D said,

    May 8, 2013 @ 6:23 pm

    So I went and looked at how they actually did tree-building. When I first heard about the paper, I had assumed it was like in the previous papers using computational phylogenetic methods, where there was an argument to be made that they were at least able to distinguish innovators and retentions. As I understand it, this method really isn't.

    It basically works as follows: calculate the distance between language A and B based on the number of cognates, given the rate of lexical replacement. Do this for all other pairs of languages. Use these distances to score the likelihood of a randomly generated tree. If this sounds sort of familiar, it should. The likelihood calculation (though—and I need to be very clear on this point—not the entire tree-building process) is, as I understand it, a fancied-up version of lexicostatistics.

    Boy was I surprised when I noticed that…

  23. MJ said,

    May 8, 2013 @ 7:26 pm

    It gets worse: in London's "Metro" on Wednesday (http://metro.co.uk/2013/05/08/mother-that-bark-on-the-fire-is-spitting-up-ashes-what-cavemen-really-talked-about-3715676/), we read:

    All this glacial activity begs one big question: why didn’t ‘ice’ show up as one of the words being regularly spoken by humans 15,000 years ago? As in, ‘Hello mother, what’s the story will all this flipping ice?’

    Strangely, a variation of the similar-sounding ‘lice’ almost made the cut of the top 23 – the singular ‘louse’ just fell short.

    Quite apart from missing the most fundamental points about the paper in question – and about linguistics (the faulty definition of "cognate" looks to be lifted from the earlier Washington Post article), the author has with the "louse" idea shown himself clueless beyond help.

    Actually, after a nice sit down, this train wreck of an article makes for quite entertaining reading.

  24. Optimism: The Forgotten Sensation | Sunny Schomaker said,

    May 8, 2013 @ 7:27 pm

    […] in particular; it was just the first in the Google search) failed the smell test, and as I hoped, Language Log has posted a nice critique. Share this:TwitterFacebookLike this:Like Loading… Tags: Dissertation, Graduate school, When the […]

  25. John Colarusso said,

    May 8, 2013 @ 7:57 pm

    As to stability, English has lost , supposedly a steadfast word, in the last few centuries. If English were not a written language, it would have fallen afoul of the techniques in Pagel et alia.

  26. Mark S said,

    May 8, 2013 @ 8:16 pm

    I was waiting for this shoe to drop.

  27. William M. Johnston said,

    May 8, 2013 @ 9:05 pm

    Hello Sally,

    I've been watching these developments as an interested outsider. I linked to this page on my tiny FB community for the benefit of those who might want to pursue the topic further.

    My treatment of this research is to take a rather jaundiced eye toward the purported claims. I hope that I've given the topic suitable explanation. I illustrated the topic with Kang and Kodos from the cartoon programme The Simpsons, in which Kang states that he is not speaking English, but rather Rigellian. False cognates ad absurdum.


  28. Cy said,

    May 8, 2013 @ 9:47 pm

    These always sadden me greatly. I remember when I was a kid I'd read through the French and English dictionaries I had, looking at how some words were alike, but then how the etymologies were still different, and how it made me so interested in linguistic topics. It didn't take me even to get to my teen years to learn the lessons these adults haven't yet absorbed – I can only imagine the many many hours they spent, tweaking both the semantics and the phonology of all these speculative, uncertain transcriptions from proto-proto languages. Unless they actually know what they're doing, and are using this solely for publicity, which is possibly sadder?

    Great write-up, so quickly. Such a nice point-by-point (and subfield-by-subfield!) dismantling, thank you.

  29. Erika Alpert said,

    May 8, 2013 @ 11:08 pm

    Thank you Sally. What I have always wondered is why there is such passionate devotion to the project of super-family reconstruction. I'm not entirely certain what we are supposed to learn from it, how it actually advances our understanding of language OR humanity in any significant way.

    Honestly the desire seems primarily religious to me. Babel Project indeed.

  30. Jonathan Gress-Wright said,

    May 8, 2013 @ 11:37 pm

    Thank you, too. In terms of the popularity of long-range reconstruction among non-linguists, I suspect the following:

    In most scientific fields, researchers continue to expand the frontiers of knowledge in significant ways: discovering new species, new subatomic particles, new insights into human behavior. The reason appears to be that the data is still there for us to find and analyze.

    In traditional historical linguistics, however, there is not that much new data to find. Occasionally, some really significant discoveries turn up, e.g. the relationship between Ket and Na-Dene, but for the most part we have exhausted everything solid that the comparative method can yield.

    The honest historical linguist then has to admit that the field, in its traditional conception, is largely over (apart from tying up loose ends in the well-established families), and we see the frontiers of historical linguistics are now mostly found in variationist approaches such as sociophonetics or historical syntax. This is where genuinely important new findings continue to be made.

    This acknowledgment of the limitations of our knowledge doesn't really fit in the popular image of science as a field of limitless discovery. It doesn't seem right that we'll just never know if and how Indo-European is related to Uralic or Eskimo-Aleut. Hence the enthusiasm for this kind of garbage.

  31. Piotr Gąsiorowski said,

    May 9, 2013 @ 2:34 am

    @Jonathan: I agree we may never know if long-range schemes such as Indo-Uralic, Indo-Uralo-Eskimo (let alone Nostratic, Eurasiatic, Amerind, Dene-Caucasian, etc.) are valid. I think there's no harm in trying to extract some genetic signal just in case we might overlook something real. But Pagel at al. simply assume that the Eurasiatic hypothesis is correct and start from there, trying to build a well-supported family tree. To make matters worse, they use highly controversial (and tendentious) etymologies from the Tower of Babel database uncritically, and even overinterpret them (I don't believe those etymologies justify the claim that you get a 7/7 set of matches for "thou" across Eurasia. Even their map is wrong: unlike their source of data, they don't seem to classify Japanese and Korean as Altaic. Anyway, whatever one can say in a comment box on a blog, it's only the tip of a bloody big iceberg of problems.

  32. Piotr Gąsiorowski said,

    May 9, 2013 @ 2:39 am

    I forgot to add that though the article is bad enough, its flaws are nothing compared to the coverage it got from some "science reporters". Though it's a modest-scale hype, it reminds me a little of the recent ENCODE affair.

  33. Jon said,

    May 9, 2013 @ 2:59 am

    @Jonathan Gress-Wright: Don't give up too soon. Most of the criticism of this work is based on the quality of the input data – out of date, incomplete, often wrong. If reliable dictionaries of PIE and proto-Uralic were available, it might be possible to demonstrate a clear relationship.
    Even physics has suffered from the 'nothing left to discover' illusion before now. The history of science is littered with statements of the type "We will never know how…", later proved wrong.

  34. Meesher said,

    May 9, 2013 @ 3:48 am

    I was suitably dubious when I saw the write-up at Wired, and was waiting for the LL response, though it seems that wasn't as bad as the Metro or WaPo versions.

  35. Victoria Simmons said,

    May 9, 2013 @ 4:08 am

    I spent all day yesterday fretting about this–as friends on Facebook who should know better kept re-posting the Washington Post article–and wondering who here would be the best to e-mail about it. I should have known you had it well in hand, and that my amateur linguist's instant suspicion of the methodology was well-founded. I took a class in grad school from a famous supporter of the Nostratic theory, and in giving the PIE root for various things we discussed he always mentioned the Nostratic root as well. That was hard enough for me to swallow, let alone this stuff. "Garbage in, garbage out" seems exactly the right sum-up.

    To be fair, the notion that the little speech in the Post would have been understood 15,000 years ago is not that of Pagel et al. Kudos to David Brown, the author of the article, whose innovation it apparently was, for it's that that is responsible for all the attention the article has received, as sure as eggs is eggs. At least the science pages on Facebook yesterday provided some entertainment, as people offered their suggestions for the words they thought were most likely to be in consistent daily use for 15,000 years.

  36. NW said,

    May 9, 2013 @ 4:25 am

    Their new method sounded interesting when I read the article, as opposed to the meaningless fluff of the press coverage. The statistical separation of less noisy and more noisy words is just the sort of new tool that could push back reconstruction a little further. I really hope we haven't exhausted all the evidence. Unfortunately, when I read it I didn't know what LWED was – I didn't associate it with Starostin. Oh dear. Now I have the greatest admiration for the time, work, and sincerity Starostin put into that impressive project: I just wish the etymologies didn't look like the sort of thing I did as a child.

  37. Rose said,

    May 9, 2013 @ 4:44 am

    @Jonathan, if historical linguistics is as good as over because there's no more data left to find and analyze, then does that mean we know all there is to know about every language in the world? If so, i'm definitely out of a job!

  38. david said,

    May 9, 2013 @ 6:34 am

    PNAS says Colin Renfrew edited* this article.
    *This Direct Submission article had a prearranged editor.

    I think this is code to remind people that members of the NAS can publish whatever they like in the proceedings.

    I was surprised at the figure with the phylogenetic tree superimposed on the map. It doesn't really fit. The stretched branch is mentoned but the foreshortened one just slips through.

  39. Richard Sproat said,

    May 9, 2013 @ 7:34 am

    I'm a little surprised that the New York Times has not (yet) picked up this story.

    But perhaps they are already busy preparing for the full Magazine spread on Atkinson, something about the brash young psychologist who is challenging the huge monolithic field of historical linguistics. It's about time for such a spread: he's published multiple papers in high-profile venues, most or all of which have been roundly debunked by those of us in the establishment. (And when I say "we" here, I don't mean it necessarily in the same way as post-glacial-maximum speakers of proto-EuroAsiatic would have meant it: after all, I'm no historical linguist.)

  40. Alex Ratte said,

    May 9, 2013 @ 8:02 am

    Thank you for posting this!

    I think you're absolutely on the money about Altaic being one of the major fumbling points for this research. As a grad student studying the history of Japanese/Korean, it never ceases to amaze me just how much the existence of "Altaic" is taken for granted. I may not agree with the theory, but even the respectable Altaicists will at least acknowledge that even reconstructed proto-Altaic is exceedingly tentative, and certainly no basis for further megalo-comparison. It's just inexcusable.

    One of the problems that our fields are facing is that there are fewer and fewer historical linguists out there to "do it right," as it were. Sure, a terrible NYT article will get everyone's blood boiling and will get some attention, but the fact that so much less historical work is now being done opens the door to those who abuse the methodology. Just my two cents.

  41. Chandan Narayan said,

    May 9, 2013 @ 8:39 am

    Thanks for this, Sally!

  42. Richard Compton said,

    May 9, 2013 @ 8:47 am

    Even if we were to set aside that Proto-Eskimo-Aleut is obviously a better point of comparison than Proto-Eskimoan (PE), I'm confused as to how it could be included in the cognate set for "thou", given that in the Comparative Eskimo dictionary (CED) (and even in the LWED) the root for this pronoun is *əl- or *əɬ-.

    Also, the supplementary text refers to *anəR- stating that it means both "spark" and "fire" but in the CED the PE form listed as *anəR- actually means "breathe (out)". While there's a Proto-Yupik-Sirenik form, *anəq-, whose meaning includes "spark", it's listed as meaning "spark or ember", and not "fire" as stated in the SI text of the paper. The form for "fire" is actually *ək(ə)nəR, in which -nəR is likely a nominalizer according to the CED. Unless these inconsistencies reflect changes in the new addition of CED (which I doubt, because the page number in the LWED matches the entry for *anəR-), it would appear that they were simply entered incorrectly into the LWED and Pagel et al didn't check the original sources.

  43. Piotr Gąsiorowski said,

    May 9, 2013 @ 9:57 am

    @Richard Compton: They base their judgement on that of the editors of the LWED, who are very heavy-handed comparatists. I suspect the final segment in *əɬ-vǝnt/*-vər is taken to reflect the T-type pronoun, presumably on the grounds that any coronal obstruent anywhere in the word will do. There are problems with other families too. The forms cited for Chukotko-Kamchatkan and Kartvelian are in fact plural (the singular doesn't happen to match); the Dravidian 2sg. pronoun has an initial *n-, so — hey presto! — (rather suspect) verb ending replaces it in the equation. With such sleight-of-hand 'most everything can be made to match 'most everything else. It's interesting that Pagel & al. declare in the article that they "sought to adhere to the precise meaning" (Supporting Information, p. 1). Hmm… the function of the 2sg. pronoun seems pretty precise, and still we have this mess plus the false claim that all the seven familes display "cognate" forms.

  44. Claire Bowern said,

    May 9, 2013 @ 10:13 am

    While I agree that there isn't much to be positive about in this paper, two things strike me. One is the personal vitriol against Quentin Atkinson is totally uncalled for. If you don't like the ideas, fine, criticize the ideas. (Disclosure: He and I coauthored a paper on Pama-Nyungan last year.)

    Secondly, it would be nice to see some commentary/discussion on the other part of the paper, on cognate class sizes, number of reconstructions per meaning, and frequency. That to me is a much more interesting result than the Eurasian stuff that everyone is focusing on. It is testable with decent datasets for several diverse language families and would potentially tell us some interesting things about semantic change and reconstruction.

  45. J.W. Brewer said,

    May 9, 2013 @ 12:15 pm

    To Professor Bowern's point, I expect (I mean this as a descriptive point, not necessarily a normative one) that one reason for the level of hostile rhetoric toward Atkinson is the perception that he is a serial offender. This is at least the third time in the last few years he has co-authored a paper that: a) made a bold claim; b) was published in a prestigious-sounding "scientific" (but not linguistics-specific) journal that does not seem to have done particularly rigorous, if any, peer review; c) got widespread uncritical rediffusion in non-specialist mass media; and d) was more or less immediately treated as very seriously flawed by a broad range of people who had relevant knowledge in the relevant subfield of linguistics. Even if one wants to give people the benefit of the doubt and engage with ideas rather than personalities, people will have different intuitions as to how long a particular individual is entitled to keep getting the benefit of the doubt over and over again. Now, by contrast I have seen neither vituperative criticism of, nor clueless mainstream media praise of, his article co-authored with Prof. Bowern, which appears to have gone through peer review at a journal that may actually know how to do peer review of linguistics scholarship. So perhaps the lesson is that Atkinson's mathematical methodologies can actually be useful if applied to the right topics in work done with the right co-authors subject to the right sort of peer review before publication?

  46. Richard Sproat said,

    May 9, 2013 @ 12:35 pm

    @J. W.Brewer

    Actually as some of you may know (since it came up in the discussion of that paper), I was one of the reviewers for Atkinson's out-of-Africa serial-founder-hypothesis paper for Science. Suffice it to say that they went ahead and published it anyway.

  47. Piotr Gąsiorowski said,

    May 9, 2013 @ 12:56 pm

    Hint: if you want to publish a paper on anything linguistic (even if the whole project is interdisciplinary), it doesn't hurt to have a bona fide linguist aboard. I suppose Dr Andreea S. Calude played the role in this case, but judging from her home page and Academia.edu profile she has no experience whatsoever in historical linguistics. Had the paper been peer-reviewed rather than just approved and "edited" (?) by Sir Colin, I'm sure MAJOR revisions would have been recommended.

  48. Sean Fulop said,

    May 9, 2013 @ 4:13 pm

    A lot of linguists hate this stuff, but I don't see that anyone is proving it is wrong. It is statistically quite interesting. Perhaps it is based on some flawed things, but that probably doesn't have much impact on the overall. That's what statistics are for. Is it lost on people that Greenberg's Amerind hypothesis has since been proven by genetics?

    [(myl) Um, making sense of small amounts of noise is not "what statistics are for". And could you provide a citation for the claim that "Greenberg's Amerind hypothesis has since been proven by genetics"?]

  49. Sean Fulop said,

    May 9, 2013 @ 5:49 pm

    Well, you know linguists, the Newtonians of the quantum age! You guys are saying that the entire database of material for the Pagel et al. study consists of bad reconstructions? That must be the most amazingly poor database of linguistic data anybody ever bothered with.

    Here is the genetic study vindicating Greenberg:
    Reich, D. et al. (2012) "Reconstructing Native American population history," Nature 488:370-374.

    But you never know, maybe all the genes were flawed.

    [(myl) No, but your logic is. No one has ever doubted that Eskimo-Aleut and Na-Dene are coherent linguistic families. That was not Greenberg's discovery, and his agreement with that traditional view was completely uncontroversial. The controversial part was whether the large number of apparently very diverse languages outside those two groups should be considered to be one "Amerind" linguistic family.

    Now, in the first place, genetic history is not necessarily the same as linguistic history. And in the second place, the study that you cite shows only that the 52 Native American linguistic groups they studied represented at least three distinct Asian gene flows, of which Eskimo-Aleut and Na-Dene are two. And as I said, it has never been controversial that those two are separate groups. They found plenty of genetic substructure in the remainder; how that substructure relates to linguistic history, on either side of the Bering strait, remains unclear.]

  50. Jonathan said,

    May 9, 2013 @ 5:55 pm

    @Sean Fulop

    It doesn't really matter whether genetic studies demonstrate that the speakers of the putative Amerind subgrouping are all descended from the same migration. Language families are judged on the basis of linguistic evidence, not genetic evidence. After all, one would never do the reverse and suggest that monolingual English speaking Native Americans are genetically European.

  51. Jonathan said,

    May 9, 2013 @ 5:58 pm

    Edit regarding my previous comment:

    macrofamily, not subgrouping. Apologies.

  52. Sean Fulop said,

    May 9, 2013 @ 5:59 pm

    I knew it would only take a few minutes for a linguist to come up with that bon mot.

  53. Piotr Gąsiorowski said,

    May 9, 2013 @ 6:23 pm

    Sean, a phylogenetic algorithm does not generate results out of nothing. It needs some sort of reliable empirical input — a properly prepared data matrix — to infer a phylogeny from. Feed it garbage, and it will translate it into garbage. Any reconstruction is a hypothesis, not a fact. You should be especially careful when you use hypothetical entities as if they were real-world empirical objects. Bad (flawed, fanciful or heavily biassed) hypotheses are not just low-quality data — they are pure noise. You may use a valid method to draw conclusions from noise. The reasoning will be valid in such cases, but the conclusions will not.

  54. Sean Fulop said,

    May 9, 2013 @ 6:46 pm

    OK naturally I agree with the above. I think it is pretty incredible if their published paper is based entirely on wrong data, and I would then of course retract my support for their work. I have not really seen a convincing account yet that proves all the input was bad–surely one of the experts can publish such a thing as a letter to PNAS?

  55. Piotr Gąsiorowski said,

    May 9, 2013 @ 6:52 pm

    It would certainly be a good move. I think a leading scientific journal like PNAS should consistently follow a stricter publishing policy. It seems that the paper was accepted at the discretion of a single NAS member (widely known to be sympathetic to this kind of research, but himself not a professional linguist) without a proper reviewing procedure.

  56. pacatrue said,

    May 9, 2013 @ 8:08 pm

    First, I should say that my experience with the article was thus: non-linguist friends post the Washington Post article on Facebook. I take a look at it and think, "this is going to be flawed". My response to Atkinson's Out-Of-Africa article was also to list a bunch of reasons I was doubtful to a colleague. I also have no personal connection to any of the researchers.

    All of that now said, I, like Dr. Bowern, have been taken aback by much of the vitriol in these comments (not in the actual blog post). Dr. Thomason lists a good number of seemingly valid criticisms that should be taken on board and I was happy to read them and felt better informed. In the comments, however, people seem to have gone to the extreme of declaring the whole thing junk and anything a particular researcher touches as junk. However, I am of the opinion that Atkinson and colleagues are warranted in pushing the available methods in historical reconstruction. The major criticism that I perceive in Dr. Thomason's response was that they are using bad data. If that's the main issue, then knowledge might be pushed forward with better data and improved versions of the statistical method.

    The paper also would have been greatly improved by further peer review from historical linguists so that issues could have been addressed. I completely agree. At the same time, fields are often quite profitably modified by people outside of the field. When linguists make comments that come off as, "it's no good if it wasn't a linguist using proven linguistic techniques," then it doesn't help. Probably many of us have had experience with or are knowledgeable of useful research that was ignored or pushed aside because it doesn't match current theory. (The extremely low opinion of others' research that often come up in the Nativism Wars should make us wary.) Of course, I am not claiming that we should just publish junk because it might be helpful one day. I am simply saying that Atkinson and colleague are using techniques that might be useful on a good dataset. We should not throw out the baby with the bathwater. Perhaps, I should finish with noting that I am also a linguist.

  57. un malpaso said,

    May 9, 2013 @ 8:34 pm

    When I saw the headline of the original story (well, its report in Wired), I immediately felt suspicious… but I was blown away when I saw that ONLY 23 words were supposedly used as data. 23 words through 15,000 years..

    I don't see how any researcher anywhere could consider that as a viable sample size. This isn't just noise, it's the NOISE of noise. It seems analogous to doing an astronomical survey of the Universe by skywatching from the bottom of a 15000-foot well (while wearing dark glasses).

    Language Log was my very next stop. You are doing a fantastic service spreading the true nature of linguistics study today, and quickly debunking these kinds of sweeping claims that seem to appear every week in the popular press. As a layman with a life-long interest in linguistics that borders on the pathological ;) I am deeply grateful for your existence.

  58. Sean Fulop said,

    May 9, 2013 @ 9:35 pm

    In reply to moderator's comment above, I thought it was also well established that genetic families are highly correlated with linguistic families. So that if the Amerind speakers are all from one gene stream, this implies that they are all from one linguistic stream also.

  59. Barbara Partee said,

    May 9, 2013 @ 10:37 pm

    As an NAS member, I've served that "editor" role for PNAS a few times. As editor, you have to choose some reviewers — you list several in order, and they write to them for you, and take the two you've ranked highest who agree to do it. Then you study their reviews and the paper and make a decision (the usual range — accept, accept w minor revisions, revise and resubmit, reject), and if it goes to revise and resubmit you do it all again. So it is peer-reviewed, but there's a lot of power in the hands of the "editor", and no accountability as far as I can see. Even before seeing the comments above about the need to give feedback to PNAS, I had been thinking the same thing. I've written to Sally to ask her to help me to work out a suitable message, referring both to her post and to the one that Asya Pereltsvaig and Martin Lewis wrote over at GeoCurrents — http://geocurrents.info/cultural-geography/linguistic-geography/do-ultraconserved-words-reveal-linguistic-macro-families . It really is embarrassing that the NAS appears to have given its imprimatur to this work.

  60. marie-lucie said,

    May 9, 2013 @ 11:11 pm

    @un malpaso: I was blown away when I saw that ONLY 23 words were supposedly used as data. 23 words through 15,000 years.. I don't see how any researcher anywhere could consider that as a viable sample size.

    I agree entirely. The more distant a language relationship, the more, nor the fewer, examples you need.

  61. Andrew McKenzie said,

    May 9, 2013 @ 11:23 pm

    I think this paper is an excellent case of the distinction between logically valid argument (which it is) and a logically sound one (which it isn't).

    Assuming the statistical modeling is correct (I can't judge that), it tells us that IF these are the right proto-forms, THEN the languages are related. But as many have pointed out, we already know these aren't the right proto-forms. The fact that you use a sophisticated model to analyze them won't save the analysis.

    It's a bit as if you wanted to analyze Derek Jeter's performance (at baseball) using advanced sabermetrics, but used Alex Rodriguez's stats to do so. And when someone interjects "hey wait a minute! shouldn't you—", you say "Why do you hate sabermetrics? You're just a dinosaur."

    I pointed out to someone earlier about this on facebook with another analogy— you can use statistics to wring a diamond out of a lump of coal, but not out of a lump of sulfur. This is a case of the latter.

    Even though this research isn't sound, that's not the part that infuriates linguists. The mass comparison method in general suffers from the problem of data selection— there's no criterion for determining when semantic relationships are close enough, there's no criterion for choosing among several proposed proto-forms for a given lexeme, there's no effort to avoid onomatopoeia, multimorphemic words are wrongly analyzed as monomorphemic and vice versa, parts of speech are ignored, and sometimes the wrong morpheme in a multimorphemic word is chosen.

    The thing is, practically all of these selection problems could be eliminated simply by consulting people with expertise in the language families in question. The problem is, of course, that once you eliminate the bad selections, you're not left with much. Thus, this kind of science is either grossly incompetent or deliberately misleading, and that's why its publication gets linguists up in arms. Couple that with the fact that with proper data, this method might actually lead to amazing new discoveries, instead of wasting everybody's time fighting an old fight, and it's easy to see why linguists are tired of it.

    It isn't about who's doing it.

  62. Nick Zair said,

    May 10, 2013 @ 3:29 am

    @ Claire Bowern. An attempt to address your desire for commentary regarding cognate class sizes. As far as I understand, the authors of this paper took the judgements of the LWED scholars about cognacy of 200 word-meanings and for each word-meaning they saw how many cognates in the 7 language families these scholars had identified (from a maximum of 7, i.e. 1 in each language family down to 1, i.e. only found in 1 language family). They then compared this pattern with their predictions of which word-meanings are least likely to have their phonological shape replaced by another phonological shape by non-regular sound change ('replacement').

    What they found was that words which they propose are least likely to be replaced over time are more likely to have been judged as having multiple cognates by the scholars of LWED. Now, many historical linguists would argue that the decisions made about cognacy by LWED are not based on firm methodological bases. So the paper's claim that its methods are backed up by the fact that it agrees with the cognacy judgements is neither here nor there for these historical linguists, because the cognacy judgements are not seen as useful independent evidence. In fact, what the paper does seem to prove is that the LWED scholars tend to assign high cognacy judgements to high frequency words. Now, this is an interesting discovery, for which all sorts of explanations might be proposed (including that the LWED people are right). But it does not provide support for the paper in the way that the authors seem to be claiming.

    The authors do address this point on p.5, but not very satisfactorily. They say that the fact that there are some high frequency words with low cognate numbers and vice versa mean that any bias on the part of the LWED scolars 'does not mechancially overrule other signals in the data pointing either toward cognacy or the lack of it'. But, if, as many linguists would claim, there are no (usable) signals in the data (because the comparative method, the only known method that works, does not work at such time depths), the cognacy judgements of the LWED scholars still cannot be used in the way the authors of this paper want them to be used.

    I hope I've understood correctly what Pagel et al. are saying, and not misrepresented their claims, and also that I've explained myself here clearly.

  63. Piotr Gąsiorowski said,

    May 10, 2013 @ 6:33 am

    Nick, what shall we think of cognacy judgements such as: 4/7 cognates for 'mother' because words of the general form /VmV/ (any vowels will do) can be found in four families? On 'thou', see above (the only potentially promising match is between IE and U, the remaining five should really be disqualified). 'Give' yields five matches, but it's clear from a look at the LWED entry, that anything with an initial dental will qualify. My personal hypothesis is that (1) high-frequency items tend to be short (they are often CV pronouns, and even the content words usually fit a simple CVC template with unmarked consonants), so the probability of a chance match is higher; (2) the LWED scholars are more likely to stretch the evidence for the "core vocabulary" (as is obvious in the case of the 2sg. pronoun).

    I am no Luddite. I am sure algorithms borrowed from biologists can be very useful in linguistics, but not when applied to figments of questionable methodology rather than real and well-prepared data.

  64. Marek said,

    May 10, 2013 @ 6:47 am

    When I was taking my first steps with Bayesian classification in Python's Natural Language Processing Toolkit, it took me a couple of days to learn the valuable lesson that while you can get interesting results by analysing just about any large amount of data, it takes a solid model and relevant data to get results which are actually reliable.

    Unfortunately, I didn't have access to large-scale lexical data from contemporary and reconstructed languages back when I was still in my childish "let's analyse everything!" stage, so a list of male and female English names that came with NLTK resources had to do. Nobody was interested in publishing my paper entitled "Two V's in a name point to a female nature of its bearer".

  65. KevinM said,

    May 10, 2013 @ 9:50 am

    @Marek "Two V's in a name point to a female nature of its bearer"
    And three point to extreme attractiveness: Va-va-voom!

  66. Howard Oakley said,

    May 10, 2013 @ 12:53 pm

    Sean: "I thought it was also well established that genetic families are highly correlated with linguistic families."
    I don't think it is wise to assert anything has been established about haplogroup distributions, other than they look interesting! We are currently seeing the start of the second wave of genetic studies, in which the whole mitochondrial DNA is being sequenced, assumed mutation rates are being challenged and refined, and there are some significant yields from ancient DNA. Already in Europe these are starting to challenge what had become almost dogmatic in some circles, and the very attractive idea of IE languages sweeping across Europe with the advent of agriculture is starting to look plain wrong.
    As others have said very clearly, proving that someone's ancestors had a particular haplotype does not tell you what language they spoke, or what shape pots they made.

  67. Sean Fulop said,

    May 10, 2013 @ 4:05 pm

    I'm not an expert in the relations between human genetics and language families; I had thought (mistakenly?) that Cavalli-Sforza et al. had shown that they were tightly correlated (see his article in Scientific American from 1991, for instance). And the authors of the paper I cited above on Native American genetics do cite Greenberg and do state that their results are consistent with his claims. I'm not the one making this connection.

  68. Piotr Gąsiorowski said,

    May 10, 2013 @ 4:28 pm

    I had thought (mistakenly?) that Cavalli-Sforza et al. had shown that they were tightly correlated (see his article in Scientific American from 1991, for instance).

    He showed no such thing. The coherence between the trees he compares is partly an exercise in self-deception (see this messy affair), partly artifactual and trivial (when he refers to "superfamilies" which are really geographical groupings, not taxonomic units recognised by mainstream linguists). The language "trees" in question are not real trees, anyway.

  69. Cy said,

    May 10, 2013 @ 4:29 pm

    @Fulop You keep saying the same thing and when people tell you what's wrong with your reasoning you are sarcastic. This makes it seem like perhaps you're not a troll, you just are missing the point. It is very frustrating. Read through the post again, carefully.

  70. Bob Ladd said,

    May 10, 2013 @ 4:45 pm

    I happened to read about this story the other day not in the Washington Post or Metro, but in La Repubblica. Their take on it was in some ways even weirder: they suggested that the study had reconstructed the original language, and had dated it to 15000 years ago.

    Their report begins: "While the earth was re-emerging from the last ice age, and before the building of the Tower of Babel, there was a period in which people spoke a single language." It talks about the "mother language common to all the peoples of Europe and Asia" and says that the researchers are "convinced that traces of it remain today".

    It also says "The researcher's technique has been criticised in the past, and perhaps it will be again today". Perhaps the journalist knew that Sally Thomason was already hard at work preparing her critique.

  71. Jerry Friedman said,

    May 10, 2013 @ 4:47 pm

    Mentioning Greenberg and that Cavalli-Sforza article would be a clever way of trolling Language Log commenters, I suspect, but I didn't see anything sarcastic in Sean's latest comment, where he cited the reasons for his ideas instead of saying the same thing.

  72. Sean Fulop said,

    May 10, 2013 @ 5:33 pm

    I think I missed the memo which lists the scholarly publications and authors we are not allowed to cite around here, not realizing their work had already been completely discredited. Now Cavalli-Sforza too? I think I'm going to stick to my own areas of expertise from now on, it's hard to read up on stuff when half the stuff you might read turns out to be wrong.

  73. Piotr Gąsiorowski said,

    May 10, 2013 @ 6:00 pm

    Sean, why don't you use your own intelligence? Did you really have a look at the trees used by C-S for comparison? Do you believe they are congruent? Some hints: many "language families" are represented by a single stem with no branches. Such a non-branching graph will fit wherever you place it. Most of the families represented by more branches (flat, non-binary structures, by the way), don't match the genetic tree at all — at best they are drawn so as to seem to match it partly. And see what C-S did to Sino-Tibetan. So you READ C-S's articles, SAW his diagrams, and BOUGHT them?

  74. Sean Fulop said,

    May 10, 2013 @ 6:13 pm

    The truth is I didn't critically read them, I was willing to trust their conclusions because they were part of the published literature and this isn't really an area where I have concentrated my scholarly attention and energy. Like I said, I'm not sure who to cite outside my own scholarly domains, but even within those domains there are often disagreements over validity. I don't always take the time to personally consider everyone else's conclusions. I can't claim to know, or to have the time to decide, which publications are really just crap and which are valuable contributions. In a forum like this I can gather more information about what people think, as we are now doing.

  75. Howard Oakley said,

    May 11, 2013 @ 12:34 am

    Sean: "I was willing to trust their conclusions because they were part of the published literature"
    That is the whole point of this debate – simply because something appears in the published literature, it does not mean that you can trust its conclusions.
    A couple of recent DOIs for you to glance at, which might deepen your understanding (although neither paper is beyond criticism!):
    doi:10.1371/journal.pgen.1003460 whose very title should help blow apart some of those older papers "Continent-Wide Decoupling of Y-Chromosomal Genetic Variation from Language and Geography in Native South Americans"
    DOI: 10.1038/ncomms2656 |www.nature.com/naturecommunications which rewrites the timeframe of waves of migration in Europe.

  76. A.F said,

    May 11, 2013 @ 1:33 am

    It's a pity that such interesting and important topics are laid waste by people who know little about the issues and methods, and are edited by people who do not even have a linguistic background.
    Supposedly serious journals now repeatedly and routinely publish crap in the name of historical linguitics. This is just unacceptable.

  77. Victor Mair said,

    May 11, 2013 @ 5:40 am

    I would like to take advantage of the current discussion to note that you can now find Manfred Mayrhofer's three-volume Etymologisches Wörterbuch des Altindoarischen online as a PDF.

  78. Anna Kase said,

    May 11, 2013 @ 7:01 am

    Having some queries about the Patel et al article myself, I was interested in your views – until the phrase…"I'm not qualified to judge Pagel et al.'s statistics, although I remain skeptical…" made me start to wonder. Your admission, while honest, undoes any balanced view you could bring to your blog. So for the rest of it, I had to take it with a grain of salt.

    Firstly, understanding the statistics behind this work is quite important – so before one can make any sort of judgement, whether positive or negative – there is a need to to have some qualified assesssment.

    However – and more importantly – is the fact you DO judge their statistics. For example, you state "If the reconstructions used by Pagel et al. for their statistical analyses are not reliable in either form or meaning, then the statistical results of comparing these reconstructions cannot provide any evidence for distant relationships among the seven groups they compare." That sounds very much like judging their statistics to me.

    Your inconsistency, I suspect, exposes less of a critique about the Patel et al article and more about holding an ideological position within linguistics. Ideology – which comes with that stubborn refusal to possibily believe in anything that is against ones views – is sad for any science.

  79. Nelson said,

    May 11, 2013 @ 8:37 am

    Anna, it seems quite clear that Sally, both in the bit you quoted and in general, is saying that it doesn't matter how good the model is if it's applied to unusable data. This really isn't a judgement on the statistics, nor an inconsistency (unless it's a critique of a method to say that it needs valid data).

  80. Jose Arcadio Buendía said,

    May 11, 2013 @ 9:52 am

    How is it Atkinson's fault what the non-specialist press does with his article? He's a "serial offender" because other people write about him? That just sounds like sour grapes. Anyway, if it were me, I wouldn't focus on criticism of the data. Use different data, then. If the algorithm works, they've still contributed to knowledge. If it doesn't, they should back off it.

    Non-experts contribute important this all the time. That's no basis to reject anything. It was a meteorologist who discovered chaos theory after all.

  81. Jose Arcadio Buendía said,

    May 11, 2013 @ 10:16 am

    P.S. maybe he's a serial offender because he keeps getting published in Nature and Science…where people notice.

  82. Piotr Gąsiorowski said,

    May 11, 2013 @ 11:50 am

    Jose, have you read the PNAS article? What "different data" would you suggest? They used no data at all. They used COGNACY JUDGEMENTS based on very poor reconstructions. Reconstructions are not data. To use them in lieu of data is a category mistake — a methodological crime.

  83. Richard Sproat said,

    May 11, 2013 @ 3:44 pm


    You seem to be assuming that the kind of publicity this sort of work gets in the popular science press just happens. I don't think so. I don't know specifically in Atkinson's case, but I do know in other cases that have been discussed in this forum (e.g. Rao and colleagues' work on the Indus "script") that the publicity was carefully orchestrated. I suspect, in fact, that that is usually the way it works.

    As for publication in Nature and Science, as has already been discussed here, publications on language in those venues often seem to bypass the usual peer review process (note the emphasis on peer), or when linguists are involved, their recommendation may get ignored. So it's not clear that publication in those venues represents something good, though it surely does lead to publicity.

    Of course you're right that if the algorithm works, then it's a contribution. But how do we know it works except by trying it with real data and showing that we can achieve results that can be corroborated by independent evidence?

  84. marie-lucie said,

    May 11, 2013 @ 5:01 pm

    Anna Kase: Your inconsistency, I suspect, exposes less of a critique about the Patel et al article and more about holding an ideological position within linguistics. Ideology – which comes with that stubborn refusal to possibily believe in anything that is against ones views – is sad for any science.

    Ideological position? A group of non-linguists* tries to apply non-linguistic methods to solve linguistic problems, and if linguists find problems with the basic linguistic data and their use (which the non-linguists are unable to evaluate), the reason for their attitude is a matter of ideology?

    *(One of the names is that of a linguist, but one whose specialties are quite remote from comparative/historical linguistics – a subdiscipline that is unfortunately not part of every linguist's training nowadays).

  85. marie-lucie said,

    May 11, 2013 @ 5:18 pm

    The editor-cum-reviewer of the PNAS publication was Colin Renfrew who is an archeologist, although he has offered opinions on linguistics. He once wrote that he did not believe in the importance of historical linguistics for the study of the past, because (I quote from memory) "a naive linguist" could infer that since a word similar to "coffee" occurred in all European languages, PIE ancestors must have been drinking coffee! In reply, a non-archeologist could justify not believing in the value of archeology by saying that if an object such as a flashlight was found in an Egyptian or Greek tomb, "a naive archeologist" could infer that the ancient Egyptians, etc used this object.

  86. A Few Links « Literal-Minded said,

    May 11, 2013 @ 7:57 pm

    […] the standard comparative method, it's time for a harder look at this study. At Language Log, Sally Thomason does so, explaining the above points and others in much better […]

  87. Brian Joseph said,

    May 11, 2013 @ 10:02 pm

    A small point to add to Sally's terrific post — the situation with English is even more dire than her account has it with only 6 or 7 of the 23 occurring ("it looks there's a reasonable Proto-Indo-European etymology for the English words give, man, mother, fire, flow, and worm, in their current meanings"), since English "give" with its g- is either a borrowing from Norse or influenced by it; in either case, it is not an unadulterated cognate form.

  88. Rohan F said,

    May 11, 2013 @ 10:07 pm

    @Anna Kase:

    However – and more importantly – is the fact you DO judge their statistics. For example, you state "If the reconstructions used by Pagel et al. for their statistical analyses are not reliable in either form or meaning, then the statistical results of comparing these reconstructions cannot provide any evidence for distant relationships among the seven groups they compare." That sounds very much like judging their statistics to me.

    No, that is emphatically not judging the statistics. It is judging the data and the subsequent conclusions. A sound argument requires both a good methodology and a good dataset, so It doesn't matter how good your statistical model is if you're applying it to questionable data; the conclusions are still going to be unsound.

  89. Piotr Gąsiorowski said,

    May 12, 2013 @ 6:46 am

    @Brian. Even 6/23 may seem overstated, considering that the root of give is only weakly attested outside Germanic and there are at least three (near-)synonyms of 'man' far more widely attested than *manw-, which is again mainly Germanic with a likely Indic cognate (the Slavic "cognate" is only a tentative root equation).

  90. Steve Erickson said,

    May 12, 2013 @ 3:06 pm

    I'm not a linguist, so can't comment on that aspect of this research. As an ecologist, I can comment on the inclusion of at least one of the proposed ultra-conserved proto-words: worm. In an arctic or sub-arctic environment (and to a slightly lesser extent in a temperate environment), it simply makes no sense that "worm" would have been an important enough word to have a high frequency of common, everyday use, unless the reference is to parasites. Earth worms would have been absent or very uncommon in glaciated or recently de-glaciated landscapes and 15K bp was at the height of the last ice age. The energetics of using worms (even when they were sparingly present) as a food / protein source simply don't work. Large mammals were the primary source of protein and other essential resources (e.g. hides for clothing and shelter; bone for shelter and tools). Cooperative hunting to kill large mammals works out energetically for large omnivores such as humans. Digging for scarce, small worms does not.

    Now, I suppose it may have been possible that the term "worm" was referring to the parasites that would have been at least sporadically or episodically present in these hunter-gatherers' primary prey. So maybe "worm" was a short hand way of discussing the health of the large mammal populations that these people were utterly dependent on. "Its going to be a wormy year for Bison." But it seems very unlikely to me. If I am correct, then why would this word have been so "stable" as to be "ultraconserved?"

  91. A Reader said,

    May 12, 2013 @ 3:44 pm

    While I don't think this study is especially useful, I don't think it's actually trying to get around the comparative method (though the authors themselves do imply this). Their study is basically a test of the LWED hypotheses: they assume that certain kinds of words should be replaced less often, and therefore that if a reconstructed Eurasiatic includes 'conservative' words being preserved (i.e. replaced less often) in a larger number of branches than other words, then this fact should support LWED's reconstructions as realistically distributed. It only gets around the comparative method insofar as it's supposed to offer a further piece of validation to a proposal that the comparative method alone can't prove – but a set of cognacy judgements have to be arrived at somehow before these statistics can even be applied.

    This has a certain amount of logic to it, but unfortunately doesn't really amount to a terribly useful method in this case at least. The biggest problem is the circularity, which they mention towards the end of the paper but don't really address. As Piotr, I think, mentioned, it's exactly this kind of core vocabulary that the compilers of LWED seem to have bent over backwards to accommodate, stretching things even more than normal – the 2nd person pronoun example being a case in point. Aren't people going to focus much more on trying to establish 'mother' as a common cognate?

    And of course linguistic reconstructions of Eurasiatic aren't simply unproven, they are actively problematic. It might even be taken as evidence that this method is prone to producing false positives. If it's supporting a relationship where there is none, then it's not much good.

  92. Piotr Gąsiorowski said,

    May 12, 2013 @ 5:45 pm

    A very good point, Steve! Any explanation why "worm" was such a terribly important concept for Late Paleolithic hunter-gatherers in the Eurasiatic park tundras is bound to be just-so story.

  93. James Wimberley said,

    May 12, 2013 @ 6:37 pm

    A long-shot defence of worm as a concept of wide interest to palaeolithic hunter-gatherers: if it covers insect larvae, at times an important food source.

  94. E. N. Anderson said,

    May 12, 2013 @ 7:27 pm

    I was stimulated to read the article–a bit unconvincing. They mention, but basically blow off, the possibility that pronouns are so short and have to sound so simple that they are phonemically pretty constrained. They have two words (bark–I think it means as in dog–and spit) that are echoic in English and most other languages and thus apt to sound alike. The rest could well be deep-level ancient borrowing. From admittedly very limited knowledge of Altaic languages I think the similarities of Turkic and Mongol to each other and to Uralic and IE languages look suspiciously like borrowings (they are rather randomly distributed, are a mix of grammar and vocabulary, and sometimes sound suspiciously similar to the alleged cognates–as if much less than 6 or 8 thousand years had elapsed). Certainly all these language families have been in a position to borrow from each other over thousands of years. So I remain skeptical.

  95. Mother’s Day Linkdump | Cora Buhlert said,

    May 12, 2013 @ 9:29 pm

    […] identified a number of words which have remained fairly unchanged since the last Ice Age. However, Language Log is skeptical. Found via Jay […]

  96. Piotr Gąsiorowski said,

    May 13, 2013 @ 12:06 am

    bark–I think it means as in dog

    No, they mean the bark of a tree (though the agreement is nowhere as good as they make it and doesn't really extend beyond IE and Uralic). One could argue (though they don't) that the same word often means 'skin, hide', which would make it an inportant Paleolithic term. But they also have the 'mother' word which (to the extent that it has Eurasiatic "cognates" at all) is a mama-type nursery term — the kind of item that a linguist would exclude from the dataset. Anyway, why 'ashes' and 'worm' rather than, say, 'spear', 'bow [weapon]', 'bone', 'sinew', 'meat', etc.?

  97. A Reader said,

    May 13, 2013 @ 3:49 am

    I'm pretty sure that 'ashes' and 'worm' are supposed to be accidentally conserved words – the point is that the statistical distribution of conserved words favours the expected ones, not that there's a perfect match between words supposedly preserved in 4+ branches and words that are ultraconserved for a reason.

    I would be curious to know what would happen if you not only took out the entire set of closed class function words as suspect of reconstruction bias (which they claim doesn't affect their statistics), but also other words that might have other explanations ('mother', 'spit') and those that might be extra suspect of reconstruction bias because of their iconic status as basic vocabulary (I'm thinking especially of 'fire' and maybe 'hand'). If the set is the 10 words:

    man/male, old, hear, pull, black, flow, bark, ashes, spit, worm

    then only 6 of 10 are words that they expected a priori to be ultraconserved. I'm not sure what that works out to statistically when considered against the full data set, but that doesn't seem particularly encouraging.

  98. Jerry Friedman said,

    May 13, 2013 @ 12:48 pm

    In English, according to the OED, "worm" used to include everything that crawls on the ground, including snakes, lizards, toads, frogs, adult insects, crabs, etc., not to mention dragons. It's related through Indo-European to "vermin". So I feel sure that the speakers of Proto-Indo-Europan, Proto-Altaic, etc., would have seen such creatures daily.

  99. Jerry Friedman said,

    May 13, 2013 @ 12:50 pm

    Daily in the warm part of the year, anyway. And I don't think we know where those people lived 15,000 years ago. Maybe it was places that had temperate climates at the time.

  100. Howard Oakley said,

    May 13, 2013 @ 2:34 pm

    Piotr – why do linguists exclude 'nursery' terms? Surely as mammals with maternal care of our young, some of the most fundamental aspects of language must be those between mother, or parent more generally, and child? (I apologise for that being a fairly groundless assertion, but it seems fairly deep to a non-linguist!)
    That is one reason why I have found it hard to reconcile Kartvelian roots – which have the parental genders swapped, in 'deda' = mother and 'mama' = father – with IE ones. You can postulate all sorts of phonological phenomena, but those seem so singularly opposite.

  101. Faldone said,

    May 13, 2013 @ 3:31 pm

    @Howard Oakley – IANAL but it's my understanding that linguists exclude nursery terms because they tend to be generated and regenerated from time to time by the first sounds babies make and that the parents apply to themselves. The most common of these sounds are mama and papa. These sounds are most frequently applied to the female parent and the male parent respectively except in those few societies where mama applies to the male parent an papa</i to the female parent.

    As for that sentence in the Washington Post article, it wouldn't work on at least three points that I can see. The root for give that the study proposed wasn't the root of the Germanic give but the root of the Romance donate; black, as has been noted above, is not from any root that they would have attached to the color black, it's from a root meaning 'white, shining' and

  102. Faldone said,

    May 13, 2013 @ 3:38 pm

    Ignore previous post. The Submit button clicked itself, obviously.

    @Howard Oakley – IANAL but it's my understanding that linguists exclude nursery terms because they tend to be generated and regenerated from time to time by the first sounds babies make and that the parents apply to themselves. The most common of these sounds are mama and papa. These sounds are most frequently applied to the female parent and the male parent respectively except in those few societies where mama applies to the male parent and papa to the female parent.

    As for that sentence in the Washington Post article, it wouldn't work on at least three points that I can see. The root for give that the study proposed wasn't the root of the Germanic give but the root of the Romance donate; black, as has been noted above, is not from any root that they would have attached to the color black, it's from a root meaning 'white, shining' and; and the root for you is the second person singular root and not the plural that we use invariably for singular and plural.

  103. Jonathan Gress-Wright said,

    May 13, 2013 @ 6:45 pm

    Actually, I'm reading their paper now, and it looks like they were careful to stick to exact semantic matches. What's questionable is, of course, relying on LWED for cognacy judgments, the poor quality of many of the reconstructions, and assuming some kind of constant rate of lexical replacement. I'd have to check their other papers where they attempt to establish this constant rate.

    I suppose if I tried to do this, I'd forget about relying on others' cognacy judgments. I'd just deal with the putative cognate sets I had, using the best reconstructions for each family I could find. Hopefully, statistical analysis would show more likely cognates (i.e. sound correspondences less likely to be due to chance) among words that on independent grounds we'd suspect of being replaced more slowly. I have a feeling it would be better methodology to include all the world's language families, rather than assume any kind of macro-family to start out with. Ideally the more likely sound correspondences would cluster among the families already suspected of being related.

    Exactly how I'd go about this is a whole other thing. I would need to collaborate with someone who actually knows statistics well.

  104. A Reader said,

    May 14, 2013 @ 8:33 am

    Hopefully, statistical analysis would show more likely cognates (i.e. sound correspondences less likely to be due to chance) among words that on independent grounds we'd suspect of being replaced more slowly.

    Jonathan, that would of course relying on first developing some sort of algorithm to try to statistically determine sound correspondences. This might be possible, but I don't believe it's been done in the Mark Pagel/Quentin Atkinson school yet. Right now, most if not all statistical models in historical linguistics of require some sort of pre-determined cognacy judgements to work from.

    The methodology behind this paper was basically a test of the LWED reconstructions, attempting to offer further validation for their reconstructions. Sound correspondences don't come directly into it.

  105. Stefan Georg said,

    May 14, 2013 @ 3:33 pm

    Quote Sean Fulop: "That must be the most amazingly poor database of linguistic data anybody ever bothered with."

    Well, sigh, you phrased it brilliantly, it is, it is… it is exactly that.
    Look at the one example the Pagel et al. article cites, now I quote from memory, sorry if I get anything wrong, on the alleged word for "two":
    IE dwow-, well, that is OK and well established, no quarrel with that.
    The Altaic "etymology" is based on, I cut it short, a Mongolian word, which doesn't have the right vowel, lacks the labial element needed, and, Starostin says it *himself*, not the right initial consonant, so actually nothing which is right, a Turkic word the meaning of which *nobody knows* (it's, again to cut it short, from a very obscure medieval source, nobody has managed to understand properly until today and *not found elsewhere*, if you want to try yourself: it's the Danube Bolghar List of Princes) and a Tungus root the connection of which to the Korean one could at least be discussed (yes, let's remain fair).
    The Kartvelian word mentioned was never a word for "two" or anything close, it is rather a derivative of "to branch" off" (the lexical meaning of the word cited is "twins", but it is etymologically nested with words for tree-fungi and other branching-off things) – there is the – vague – alternative possibility of it being a loan from Armenian – somewhat difficult to justify, I admit, therefore I won't defend it, needs some thinking), and the "Proto-Uralic" word is simply a joke, since it is only Finnish, and its meaning is, no, not "second", but "other", from a normal pronominal root (by semantic extension also used for "second", as common in Northern Europe and probably elsewhere in the world).

    Every single "proto" in this comparison is a clear case of leading (or, then, having been led, or maybe, having led oneself, to mitigate the polemics here a bit) up the garden path.

    Please, let's not discuss this here in all details, I'm not regularly here anyway, so, if you are skeptical of my skepticism, just check all this yourself – or any other Eurasiatic, Nostratic or Altaic "etymology" in that database. You will be in for a lot of disappointment (maybe, at first), then possibly some sobering, and finally, if you keep doing it for a while, for a lot of fun (and saking your head in disbelief all the time). Promise. No need to take my word, just go there and enjoy, sapere aude :-).

    Yes, you described that database quite aptly!

  106. Piotr Gąsiorowski said,

    May 14, 2013 @ 5:10 pm

    Look at the one example the Pagel et al. article cites, now I quote from memory, sorry if I get anything wrong, on the alleged word for "two"

    They excluded 'two' from their list because there was no strict semantic match in Uralic and Kartvelian, but yes, they took the "Ptoto-Altaic cognate" of the IE numeral seriously (or rather took the LWED team's word for it).

  107. Nathan Myers said,

    May 15, 2013 @ 1:16 am

    I recall a recent explanation of how "mother" terms are extraordinarily poorly conserved: the onomatopoeic "ma" gets encrusted with formal prefixes and suffixes ("-ther"), and as they freeze it is replaced with a new onomatopoeic form ("mom"), and around again. Multiple cycles of this process were said to be documented in many language families. Seeing "mother" on their list of hyper-conserved words made me immediately suspicious.

  108. Jaska said,

    May 16, 2013 @ 2:23 pm

    My critique gives different points from that of Sarah's, like: they confuse stability with constancy.

  109. Mike Maxwell said,

    May 20, 2013 @ 11:47 am

    Jose Arcadio Buendía wrote, "It was a meteorologist who discovered chaos theory after all." It's probably also relevant that he (Edward Norton Lorenz) was also a mathematician.

  110. Richard Compton said,

    May 20, 2013 @ 1:30 pm

    As someone who works on Inuktitut, I find the Eskimoan data in the LWED appears to be highly problematic. For example, the Nostractic/Eurasiatic database of the LWED includes a cognate group for "that" which includes the Eskimo-Aleut form *aɣə-. However, according to the Comparative Eskimo Dictionary (CED) by Fortescue, Jacobson, & Kaplan (which the LWED cites repeatedly), the Proto-Eskimo demonstrative system had a 27-way distinction in demonstrative roots (which survives to varying degrees in modern Inuit dialects), of which 24 forms could be translated as "that". So, the fact that LWED chooses *aɣə-, as opposed to *ik-, or *qav-, or *un-, etc., seems to be a case of cherry-picking the data. Also, the fact that *aɣə- is set up as a cognate to the conjectured Eurasiatic *ʔa, simply based on the vowel /a/ is pretty poor evidence given that Proto-Eskimo only had four vowels. Setting up cognates based on a single matching vowel in this way will mean that ANY word in Eskimoan has at least a 25% chance of meeting the criterion of being a cognate.

    Similarly, the LWED entry for the cognate class for "I" (1st person pronoun) includes the forms *vi- (sg.) and *va(ŋ)- (pl.) and cites page 383 of the CED, even though these aren't the forms actually proposed in the CED. In fact, the forms listed in the LWED contradict the CED reconstructions; taking a modified version of the daughter language Yupik forms to be the Proto-Eskimoan forms. According to the CED, the PE forms for "I" and "we" are *uvaŋa and *uvakut, not *vi- and *va(ŋ)- as reported in the LWED. Furthermore, that very page of the CED states that the /uv-/ in these forms is "apparently from the dem[onstrative] root uv- ['here'] and 1s[g] subject ending -ŋa. So, the authorities on the language, whom the LWED cite repeatedly, give both a different form AND a different origin for these pronouns (i.e., they're derived from a demonstrative), and yet the LWED ignores both their reconstructed phonological forms and their apparent historical origin as demonstratives, instead setting them up as cognates to pronouns. Further evidence against positing the pseudo-Yupik forms as the proto-forms for Eskimoan include that Jacobson's (1995:388) grammar of Yupik states that "first person personal pronouns are based on wa- or wang- which is probably related to the base of the demonstrative adverb wangi 'right here'. In sum, I can't find any support for the LWED forms in the literature.

    The cognate set for "man/male" is also problematic. The LWED database sets up a cognate set for "male" including the Eskimoan form *inuɣ- but this form actually means "human being" in both Yupik and Inuit. The actual root meaning "man/male" is *aŋun.

    Most annoying for me is that the only Eskimoan form actually provided in the the paper (actually in the supplementary information text) is the form about which I commented earlier which is both transcribed incorrectly and glossed incorrectly in the LWED.

  111. Tracing Language Back 15,000 Years | The Descrier said,

    May 23, 2013 @ 9:18 am

    […] this month in the Proceedings of the National Academy of Sciences (PNAS), the latest paper to upset linguists around the world uses methods from computational evolutionary science to look at questions about […]

  112. Cave-speak: rough ideas for a card game | hypothete.com – web miscellanea by Duncan Alexander said,

    May 25, 2013 @ 2:03 pm

    […] in sound and meaning over the past 15,000 years of language.  Now, the research behind this is somewhat shaky, and the previous research that the latest study draws on is not that great either. However, paging […]

  113. Where does language come from? Linguistic prehistory… | Language Eleven at Braemar said,

    May 30, 2013 @ 5:01 am

    […] is a much more in-depth discussion from Sally Thomason over onLanguage Log about the limitations of the data, and the problems of this kind of analysis, and readers who […]

  114. Erik Mueller-Harder said,

    June 7, 2013 @ 8:20 pm

    Remarkably enough, Indo-European studies have moved on since 1932

    Indeed. Linguistics is only a hobby for me, but I was privileged to take several seminars in historical linguistics with Calvert Watkins in the late ’80s and early ’90s. At some point in there, he remarked that “no language has changed as much in the last decade as Proto Indo-European”.

  115. In Pragati How old is Proto-Dravidian? | varnam said,

    July 29, 2013 @ 1:06 am

    […] Ultraconserved words? Really?? by Sally Thomason at Language Log […]

RSS feed for comments on this post