Language Log

Scrabble tips for time travelers?

February 26, 2009 @ 11:12 am · Filed by Mark Liberman under Language and the media

This morning's BBC's News Hour program featured one of the most densely nonsensical three-minute sequences that I can ever recall having heard from a respectable media outlet:

The most preposterous stuff, of course, comes from Claire Bolderson, the BBC interviewer; but the responsible scientist, Mark Pagel, doesn't come off very well in this exchange either, in my opinion. (Though I recognize that such interviews are highly edited, and may seriously misrepresent the way that the interviewee would prefer to express the ideas involved.)

Part of the scientific background is Mark Pagel, Quentin D. Atkinson, and Andrew Meade, "Frequency of word-use predicts rates of lexical evolution throughout Indo-European history", Nature 2007, which was a sensible-enough paper.

There must be a new research report behind the recent blizzard of related stories, but none of those that I've seen so far tells us where to find it. I think that I can guess what Pagel et al. probably did, and how it relates to what's being said about it in the mass media, but I'll reserve further comment on the scientific background until I've had a chance to read the research in a form unfiltered by journalism. [Update 2/27/2009: actually, I'm now leaning to the conclusion that the 2007 Nature paper is all that there is behind this. ]

The print version at BBC News is here: "Oldest English words' identified", 2/28/2009.

Mark Henderson, "A handy little guide to small talk in the Stone Age", The Times, leads with the value to cross-era communication:

A “time traveller’s phrasebook” that could allow basic communication between modern English speakers and Stone Age cavemen is being compiled by scientists studying the evolution of language.

Ian Sample at The Guardian ("Word facing extinction: 'Dirty' will be scrubbed from the English dictionary") is worried about dirty:

The unrelenting force of evolution is about to take an unexpected toll on the English language by forcing some of our favourite words into extinction. The word "dirty" is most in danger of going the way of the dodo, and could vanish from use completely within 750 years, researchers said.

Robert Roy Britt at LiveScience ("Oldest English Words Revealed?") features the impact on Scrabble:

A game of Scrabble might not have been all that different in Stone Age times.

Using a computer simulation, a British researcher says he's examined the rate of change of words in languages to reveal the oldest English-sounding words, which would have been used by Stone Age humans 20,000 years ago.

Niall Firth in the Daily Mail explains it all as follows:

Some of the oldest words in the English language date back more than 20,000 years, it has been revealed.

Words like 'I', 'we', 'two', 'three' and 'five' were probably used by our ancestors in the Stone Age – and have changed very little since then.

All of these cute conceits, not to say idiocies, were prominent in the brief News Hour piece, which thus managed an extraordinary compression of nonsense. I wouldn't object if ideas like small-talk with cavemen and stone-age scrabble words were presented as little jokes, in the context of an explanation of why (for example) the effects of sound change mean that conserved cognates — like the various IE versions of five) are unlikely to be much more helpful to communication than unrelated words are, over a span of even a few thousand years, much less ten thousand or twenty thousand or (lord help us) forty thousand. But that's not what we have here, unfortunately.

[Update: As several commenters have observed, the source of the flurry in the news media was a University of Reading press release, "Scientists discover oldest words in the English language and predict which ones are likely to disappear in the future", 2/26/2009. There is apparently no associated research publication, at least so far.

The press release is rather misleading in several ways, most strikingly by talking about retention of cognates over time without mentioning the sound changes that may make them unrecognizable to ordinary speakers after a few hundred years, and will almost certainly do so after a few thousand years, much less 10,000 or 30,000 years. And the press release also makes it seem as if the scientists have "been able to go back almost 30,000 years" in reconstructing Indo-European, which is almost certainly not what they are claiming. In the absence of a coherent explanation it's hard to tell what they did do — but my current hypothesis is that they used the method documented here to estimate rates of replacement for various words in something like the Swadesh list for Indo-European languages, and then applied standard methods to those estimates in order to calculate how probable it is that each word might survive (in however mutated a cognate form) for N years. [On further reflection, I think that this is probably just a re-presentation of the work in the 2007 Nature paper, which uses Swadesh-list-type data to estimate the half-life of various words, allowing guesses about how likely it is that a particular word might have been retained — in the sense of having a modern cognate — from time depths many millennia before there is any documentation or even any reconstruction.]

But there's nothing in the press release, as misleading as it may be, about playing scrabble with cavemen.

In the context of making fun of the sillier journalistic excesses, and Mark Pagel's apparent lack of prudence or even collusion in encouraging them, I'd like to underline the fact that the methods (of computational phylogeny in general, and Bayesian approaches in particular) are worthwhile and interesting, and have in other cases (e.g. here) been applied in a responsible way to advance our understanding of linguistic history. ]

[More here and here.]

February 26, 2009 @ 11:12 am · Filed by Mark Liberman under Language and the media

Permalink

84 Comments

David Eddyshaw said,

February 26, 2009 @ 11:24 am

Also reported (just as nonsensically,in fact in practically the same words) in the London "Times", under the byline of their "Science Editor". Good grief.

Once again one is confronted by the fact that large swathes of the apparently educated are apparently completely unaware that scientific study of language exists at all.
Keith Clarke said,

February 26, 2009 @ 11:33 am

The BBC web site has an article which includes the following:

"We think some of these words are as ancient as 40,000 years old. The sound used to make those words would have been used by all speakers of the Indo-European languages throughout history," Professor Pagel said.

Hmm.
David Eddyshaw said,

February 26, 2009 @ 11:38 am

Best piece of gibberish from the Times article:

"The word “water”, for example, is wasser in German, eau in French and aqua in Italian and Latin. Although each is slightly different, they share a similar sound that shows them to share a common linguistic ancestor. "

Did anybody actually read this before submitting it? It's not actually even necessary to know anything about comparative linguistics – or anything – to see this is a complete crockus.
Lazar said,

February 26, 2009 @ 11:38 am

So "dirty" will be "the next one to go under", within the frighteningly short and foreseeable span of 750 years, based on no less than a study. I'm so glad that we know that now, and can prepare for it.
Cecily said,

February 26, 2009 @ 11:41 am

I'm glad someone's posted about this. I heard the story on BBC radio 4 on the way to work and thought it sounded like nonsense.

There is also an item on the BBC news website (http://news.bbc.co.uk/1/hi/uk/7911645.stm) and earlier in the day it had a link to a page where you could type a year (past or future) and get a list of maybe 50 words with a number indicating how different they were or will be. All rather baffling.

Sadly, that link is no longer working…(http://www.evolution.reading.ac.uk/WordChanges/)
Mark Liberman said,

February 26, 2009 @ 11:51 am

David Eddyshaw: Did anybody actually read this before submitting it? It's not actually even necessary to know anything about comparative linguistics – or anything – to see this is a complete crockus.

Yes, exactly. It would have been nice for some of these journalists to have consulted an expert in historical linguistics; but the main problem is that they all seem to have turned their brains off before they engaged the subject. For example, just above a quote from Pagel about how "We think some of these words are as ancient as 40,000 years old", the BBC News story presents a picture of William the Conqueror, with the caption "Time-travellers would find a few sounds familiar in William's words", oblivious to the fact that William didn't speak the English of 1066, but rather Norman French.

Unless the writers and editors involved are all uneducated morons, which I very much doubt, the only way that I can make sense of the media treatment of this story is to assume that they all took it as an opportunity to riff in a giddily nonsensical sort of way on what they took to be a foolish and lightweight topic.
Faldone said,

February 26, 2009 @ 12:20 pm

"Point to yourself and say, 'I'"

Duh! I could point to myself and say, "Prshknlgltch" and they would assume I was talking about myself or naming myself.
Rachael said,

February 26, 2009 @ 12:42 pm

I saw this on the BBC site this morning and awaited its deserved mocking on Language Log.

They even have a Daily Mini-Quiz question referring to it on the BBC news Magazine http://news.bbc.co.uk/1/hi/magazine/default.stm : "If you spoke the numbers one, two and three to a caveman, which number would they NOT have understood?"
dw said,

February 26, 2009 @ 1:14 pm

I would recommend the site http://www.badscience.net/ to all readers of this thread. It skewers the awful coverage of scientific issues in the mainstream media, with a British focus. Were you aware, for example, that Facebook causes cancer?
Jan said,

February 26, 2009 @ 1:21 pm

Mark Liberman: There must be a new research report behind the recent blizzard of related stories, but none of those that I've seen so far tells us where to find it.

I don't think so… I guess it all probably goes back to this press release from the University of Reading, which seems to have been launched on the occasion of Reading's supercomputer's first anniversary (Release date: 26 February 2009).
parvomagnus said,

February 26, 2009 @ 1:27 pm

I don't think you'd need much expertise to question what "similar sound" might conceivably link "water" and "aqua". My guess is it's a product of the oft-bemoaned general ignorance on linguistics, coupled with deference to experts – the writer pondered briefly ("the 'a', maybe?"), then wrote it off as something experts could see that he/she couldn't.

Skepticism would have paid off, too, as (to my knowledge) "water" and "aqua" are completely unrelated.
parvomagnus said,

February 26, 2009 @ 1:29 pm

Also, the repeated use of 'cavemen' is probably all the proof you need that the journalists aren't taking this seriously. I wonder if the unedited audio has the researcher trying to avoid 'caveman', then eventually sighing and giving in.
montyfood said,

February 26, 2009 @ 1:41 pm

Is it just me or do parts of the BBC interview sound like those preposterous exchanges by Chris Morris in On The Hour/The Day Today/Brass Eye etc.?

"What about 'one'?" … "And what about 'you'?"

[(myl) I'm not a good person to judge this, since BBC news presenters all sound to me like participants in a Monty Python skit. This is a linguistic prejudice that I've confessed before, and am not at all proud of. I've been making some progress in unlearning it, but today has certainly set me back. ]
möngke said,

February 26, 2009 @ 1:45 pm

This is probably one of the most abysmal failures of science journalism ever. When I first read the article, I couldn't make sense of any of it, and I've had quite a hard time deciphering what one would have to believe to be able to write something like that. To me, the basic assertions seem to be:

1) Emergence from a common root equals identical (or similar) pronunciation diachronically.

2) The 'cavemen' of 40,000 years ago spoke different, discretely identifiable varieties of languages that were later to evolve into modern-day 'language' units.

3) Words have fixed, unchanging meanings – thereby precluding the possibility that a word like dirty would acquire a new meaning, rather than simply disappear into oblivion.

In spite of all of this, the following sentence:

New words for a concept can arise in a given language, utilising different sounds, in turn giving a clue to a word's relative age in the language.

is still puzzling to me. Only complete and utter ignorance about linguistics would be able to produce something like that. Perhaps this is an inauspicious muddling of what appear to be half-understandings of sound change, lexical borrowing, and glottochronology…?
Peter Christian said,

February 26, 2009 @ 1:59 pm

I'm not sure whether to be pleased or disappointed that the nonsensical data which supposedly supports these claims has been removed from http://www.evolution.reading.ac.uk/WordChanges/.

For the curious, alas, it's not in Google's cache. But a page rescued from my browser's cache is now viewable at http://www.spub.co.uk/words.html. I can't remember what date I entered to get this data. I promise that I have not edited out any explanatory material!
David Eddyshaw said,

February 26, 2009 @ 2:17 pm

Being as charitable as I can (though why should I be? presumably the "Science Editor" of the Times gets paid for copying and pasting from Reading's handouts):

"The word “water”, for example, is wasser in German, eau in French and aqua in Italian and Latin. Although each is slightly different, they share a similar sound that shows them to share a common linguistic ancestor. "

I guess what has happened is that somebody has telescoped a discussion about,on the one hand

English "water" and German "Wasser" (transparently related and "slightly different" I suppose)

and on the other

French "eau" Italian "acqua" (or "aqua" as the Times would have it) and Latin "aqua".
There seems to be an irony in the fact that "eau" and "acqua", which of course are indeed related, in fact have not even one single sound in common.
Arnold Zwicky said,

February 26, 2009 @ 2:54 pm

To parvomagnus, and others who have wondered about the sound in common in "water", "Wasser", "aqua", and "eau": surely this is just the standard confusion of sounds and letters. The four words all have the letter A in their (current) spellings.
David Eddyshaw said,

February 26, 2009 @ 3:12 pm

The University of Reading press release (which Jan has linked to above) is itself a wonder of misunderstanding and inaccuracy.

I should be ashamed of it if I had any connection with the University.
Herman said,

February 26, 2009 @ 3:37 pm

Of course these articles are hilarious and utterly depressing…but equally depressing is the fact that, yet again, Language Log takes another cheap and lazy pot shot at the BBC. The BBC website article references the Times article, not any original research paper, so why not start by pointing the finger at that arm of Murdoch's media empire? But no, instead we read

The most preposterous stuff, of course, comes from Claire Bolderson

No, Liberman, no it doesn't. The most ridiculous stuff came from the academic who was being interviewed. Ms Bolderson is an excellent foreign correspondent and a good interviewer. She is not an academic researcher, and if her interviewee is spouting nonsense, it is just silly for you to blame her, now isn't it?

[(myl) Before Mark Pagel is ever heard from, Claire Bolderson leads the segment by saying: "You might not think you could have much of a conversation with a caveman, if you were traveling back in time. Well, not unless you had quite a repertoire of grunts, that is. But new research suggests that cavemen may have used several of the words we still use today." She's the one who makes the remarks about caveman scrabble. And I could go on. After reading the Reading U. news release, and the transcripts of some other interviews with Pagel, I'm less inclined to excuse him for all this. Still, though I don't have heart to do a count, my strong impression is that in the News Hour interview, Bolderson gives voice to more nonsense than Pagel does. And she (or the at least the News Hour program) was responsible for the selection of his statements from what I'm sure was a much longer interview. ]

It would have been nice for some of these journalists to have consulted an expert in historical linguistics

Well of course, but it would have been even better if "these academics" from the University of Reading had done so too, don't you think?

[(myl) It certain appears that Pagel stuck his foot deeply into his own mouth on this story. But I've seen enough cases where researchers are made to look foolish by associated PR departments or by misleading journalism (e.g. here and here and here, among many others), that I'd like to see an unedited version of his own account of what this is all about.]
Kilfenora said,

February 26, 2009 @ 3:39 pm

"water" "wasser" "eau" and "aqua" being the same reminded me of an amateur Hungarian linguist who believes English is an old Hungarian dialect. His book, The English Vocabulary from the Hungarian view, is available in all major Hungarian bookstores. Unfortunately.
Anyway, here's a summary in English. Enjoy!

http://www.varga.hu/OSKOR_ELO_NYELVE/MAGYAR%20TAJNYELV%20A%20BRIT%20SZIGETEKEN.pdf
Herman said,

February 26, 2009 @ 3:41 pm

Minor correction to the above – the BBC has changed their earlier version of the website, which did simply reference the Times article, to reflect the fact that they now have their own direct quotations from Professor Pagel. The rest of the post still stands.
parvomagnus said,

February 26, 2009 @ 3:46 pm

Arnold Zwicky – Indeed. The muddled writing in "Although each is slightly different, they share a similar sound that shows them to share a common linguistic ancestor", though, makes me think he was trying to jam something into his article that he knew he didn't really understand. I guess it's not really any more clueless than the rest of the article, it just stuck out at me.

From the press release – stuff like ""50% of the words we use today would be unrecognisable to our ancestors living 2,500 years ago" calls into question the researchers' basic understanding of language change. Or, both more and less charitably, they realized the press wouldn't really understand it, and emphasized "interesting" at the expense of "factual".
David Eddyshaw said,

February 26, 2009 @ 4:03 pm

Leaving aside the whole question of abominable reporting …

Presumably what Prof Pagel is up to is improving on the voodoo linguistics concept of glottochronology by trying to get some actual evidence about what kinds of words really are resistant to replacement, rather than Swadesh lists etc.

As far as I can see from the very limited information available he's doing this entirely with Indoeuropean. Given that other language families notoriously show quite different patterns of retention and replacement (numerals not being particularly resistant to borrowing, whole pronoun systems being borrowed etc) he seems to be replicating in a more elaborate way the whole methodological problem that lead to glottochronology getting such a bad rep in the first place.

Is there any information out there about his actual research?

[(myl) There are some papers on his web site; but the most important thing, I think, is that his lab is co-funded in this area with Russell Gray's lab in New Zealand. The work of Gray's lab on Austronesian is very credible, in my opinion. (Though there is room for discussion and disagreement about details, as always.)]
David Eddyshaw said,

February 26, 2009 @ 4:29 pm

What has the Pagel got against "four"?
The f- is not the regular development of the IE qw-, but then neither is the second consonant of "five".
What's all this about a "significant evolutionary leap"?
Nobody (nobody who knows anything about it, anyway) denies that "four" is cognate with other Indoeuropean languages' words for "four" (Is he thinking Hittite?!)
This surely does inescapably suggest, not dumbing down for the journalists, but a fundamental failure of understanding on his own part.
Rubrick said,

February 26, 2009 @ 4:37 pm

I don't think so… I guess it all probably goes back to this press release from the University of Reading (Jen)

It seems that "Reading" is a bit of an auto-oxymoron.
Harry Campbell said,

February 26, 2009 @ 4:43 pm

I note that Mark Pagel is described in the online news story as an evolutionary biologist rather than a linguist, which may be relevant. The final sentence speaks truer than it knows, when it comes to "science" reporting: "If you've ever played 'Chinese whispers', what comes out the end is usually gibberish…" http://news.bbc.co.uk/1/hi/sci/tech/7911645.stm
acilius said,

February 26, 2009 @ 5:01 pm

I'm afraid I have to agree with Herman's comment of 3:37 pm. Claire Bolderson may not ask any particularly insightful questions, but everything that sounds idiotic comes from the mouth of the interviewee. I certainly hope he was the victim of severe editing.
Alex Whiteside said,

February 26, 2009 @ 5:19 pm

I visited here with Javascript off, and was met with this instead of the clip:

"Audio clip: Adobe Flash Player (version 9 or above) is required to play this audio clip. Download the latest version here. You also need to have JavaScript enabled in your browser."

It took me a moment to realise those were not, in fact, the three sentences which inspired your contempt.
Harry Campbell said,

February 26, 2009 @ 5:20 pm

As far as I can see from the very limited information available he's doing this entirely with Indoeuropean.

Pagel is claims to be "applying these methods to Indo-European, Bantu, Austronesian, Mayan and Uto-Aztecan languages". His website —
http://www.evolution.reading.ac.uk/LingCultEvo.html
— has a (very brief) summary of his linguistic research, funded by the Leverhulme Trust, which is to "investigate the idea that human cultures behave as if they were distinct biological species". How this ties in with the fanciful idea of building some sort of phrasebook for travellers back to Nostratic times (W. Conqueror Esq. is just the beginning), based on the antiquity of a few scraps of lexis such as the English names of certain cardinal numbers, I don't clearly see.
Perhaps the funding body is not without blame in these cases? Would they have funded a historical linguist to research prokaryote evolution from a social science perspective, I wonder?
David Eddyshaw said,

February 26, 2009 @ 5:33 pm

@Harry Campbell:

Thanks for the link. I shouldn't have simply relied on the ghastly press release.

Conceivably that sort of research could be a useful way of improving on glottochronology if they are really working in depth with non-IE groups, I suppose.

I imagine real linguists have been doing this sort of thing already.

(I wonder if Prof P realises William the Conqueror spoke French?)

"Human cultures behave as if they were distinct biological species",eh?
I doubt if I am the only reader to feel a shiver down my spine at that.
Nathan Myers said,

February 26, 2009 @ 5:33 pm

Linguists have it so easy. Paleontologists have to content themselves with exploding over articles that mention "brontosaurus", or say pterodactyls are dinosaurs or bird ancestors. They see the same few (really, rather trivial) goofs over and over. After a few years it must be hard to get worked up again. Practically everything written about language, though, is nothing but howlers from one end to the other.
Mark Liberman said,

February 26, 2009 @ 5:36 pm

The work in Pagel's lab on linguistic phylogeny is joint with Russell Gray's lab, and I have quite a high opinion of Gray's work — see for instance here.
David Eddyshaw said,

February 26, 2009 @ 5:52 pm

Some of the "real linguists" indeed. I'd forgotten about that. Thanks.

It seems odd that none of these undoubtedly proper linguists could have saved Prof Pagel from himself over this presentation of his results.
Harry Campbell said,

February 26, 2009 @ 6:19 pm

Yes, we must remember that interviewees can be edited and misquoted, but… A live interview with Pagel from BBC Radio 4's Today programme can be heard here:
http://news.bbc.co.uk/today/hi/today/newsid_7911000/7911837.stm
Evan Davies as interviewer is certainly a cut above the normally moronic level of the Today prog — and you have to sympathise with his bewilderment when he says "take me through this more slowly: what do you mean, the word dirty has the highest rate of change in the English language?". I think it's clear the interviewer not the presenter is not at fault here for being totally unable to give any coherent account of his own research.

I can't resist quoting this gem from Readng University's PR people:
"The Indo-European languages are most of those originally [sic] found across Europe, the Middle-East and the Indian subcontinent. Examples include: Celtic, Roman, Greek, Germanic, Nordic (with the exception of Finnish), Slavic, Armenian, Iranian, Afghan, Gujarati, Hindi, Bengali, Napali [sic] and Kashmiri, and of course modern-day derivations such as English and Spanish. Researchers call words that persist relatively untouched across the ages 'cognates,'…" http://www.reading.ac.uk/about/newsandevents/releases/PR19825.asp
Victor Mair said,

February 26, 2009 @ 6:39 pm

The BBC interviewer, Claire Bolderson, prompts Pagel by speaking of "tens of thousands of years ago," and he runs with it by talking about a handful of words that would have been understood by "sort of cavemen around the time of the origin of the IE languages." But people were no longer living in caves when the IE languages arose. Pagel, however, must really believe that IE arose in the Upper Paleolithic, since he variously mentions time depths of 10,000, 12,000, 14,000, and 20,000 (twice) years ago. Aside from this grossly inflated time depth for IE, which leads him to conceive of IE words before IE itself was conceived (so to speak), he gives no indication whatsoever of just how much the sounds of words change over such vast stretches of time. It seems strange indeed to hear him assert that IE cavemen would utter words remotely resembling "I," "you [in the form 'thou']," "two," "three," and "five." Whatever words cavemen "tens of thousands of years ago" used to express the ideas of "I," "you," "two," "three," and "five," they wouldn't have sounded at all like these modern English words, and they wouldn't have sounded like their Proto-IE equivalents either, because Proto-IE wouldn't have been around back in those troglodytic days.

[(myl) I've not sure how well informed Pagel is about issues in historical linguistics, or whether indeed he was thinking about them at all in framing his answers in this case. As I understand it, he's talking about algorithms that estimate probability of lexical replacement as a function of elapsed time. Those estimates are not constrained to lie within the likely time depth of the last common ancestor of the languages whose cognate sets are being compared. Thus the algorithm might well assign a reasonably high probability to the hypothesis that the speech community ancestral to English, as of forty thousand years ago, shared with modern English a cognate form for the word "two"; and this would have nothing whatever to do with any estimate of when proto-Indo-European was spoken. After all, PIE itself presumably had ancestors — we just don't know much (at best) about what they were.

There's nothing wrong in principle with this form of argument, though I'm willing to bet that the confidence intervals on those survival estimates are pretty loose.

Of course, talking about IE as of forty thousand years ago is rather like talking about mammals as of 700 million years years ago. And talking about how members of the speech community ancestral to English, as of that time depth, would understand certain English words, since they were probably cognate, is… well, I'm having trouble thinking of a biological statement that is equally dim-witted.]
David Eddyshaw said,

February 26, 2009 @ 7:38 pm

It's not just the flaky general principles of this:even the specifics don't stand a moment's scrutiny.

For example:

I would have thought it entirely possible that our forebears of 20,000 years ago lacked words for "four" and "five" altogether, given that even today some of our contemporaries speak languages without specific words for these numbers.

The supposed staying power of pronouns has not prevented English from losing "thou, thee, thy, ye" and creating a wholly new possessive "its" since the time of the King James Bible translators just 400 years ago.

Going back further, "she" is a new creation since Old English, and "they, them, their" are, astonishingly, foreign loanwords.

The supposedly labile prepositions include eg "in", "at", "over", all of great IE antiquity …
David Eddyshaw said,

February 26, 2009 @ 8:36 pm

Reading MYL's comment on Victor Mair's post I realise that objecting over specifics is not really to the point; the rejoinder would presumably be that English happens to have been exceptional in these respects, but that statistically one could be reasonably sure that such phenomena would not occur commonly in the history of languages.

How sound this claim would be would depend on how well-founded Prof Pagel's algorithms really are in his data, and whether his data were sufficiently representative of the whole potential range of human language (however one might go about delimiting this).

Given that any language ancestral to PIE was certainly typologically different from its daughters, and spoken by a culturally probably very different population, and that the time-depth involved is several times over greater than the time separating PIE from Afrikaans, one can see a lot of hard work being needed … but it doesn't seem intrinsically impossible.

Pity about the presentation …
Brett said,

February 26, 2009 @ 8:48 pm

They had a rehash of this on the BBC/PRI/WGBH program "The World" this afternoon. There were a couple short, fairly reasonable sounding clips from Pagel (assumedly from the earlier interview), but most of the explanation came from a BBC reporter, who gave a predictably inept description of the supposed situation.
David Eddyshaw said,

February 26, 2009 @ 9:27 pm

Thinking about how Prof Pagel's algorithms might fail through being based on too narrow a sample, I wondered about RMW Dixon's take on Australian languages (controversial, I know). I believe there is quite a lot of stuff out there suggesting that word replacement rates can be very much affected by cultural factors,more particularly among exactly the sort of little hunter-gatherer band cultures which are unlikely to figure much in his database but were presumably even more characteristic of human organisation the farther back in time you go. To be compelling, Pagel would have to incorporate this (or refute it,or show that in the long run it could be ignored, or it all balanced out, or something …)

[(myl) As I've said, I haven't yet seen any credible account of what lies behind this particular set of press releases and media reports; but some of the earlier work in this general area used the method (and program) documented in John Huelsenbeck and Fredrik Ronquist, "MRBAYES: Bayesian inference of phylogenetic trees", Bioinformatics 17(8): 754-755, 2001 (free pdf here). This in turn uses a Markov chain Monte Carlo method, originally described in Metropolis, Rosenbluth, Rosenbluth, Teller, and Teller, "Equations of state calculations by fast computing machines", J Chem Phys 21:1087-1091, 1953. For thoughts on ways for this method to succeed or fail, by its authors, see John Huelsenbeck, Bret Larget, Richard Miller, and Fredrik Ronquist, "Potential Applications and Pitfalls of Bayesian Inference of Phylogeny", Syst. Biol. 51(5):673-688, 2002. For one particular problem with the earlier application of this program to dating linguistic phylogenies, see "Gray and Atkinson — Use of Binary Characters", 4/28/2004.

I emphasize again that I don't know what really lies behind the Reading press release and subsequent media exposure, but one place to look is Mark Pagel and Andrew Meade, "Modeling heterotachy in phylogenetic inference by reversible jump Markov chain Monte Carlo", Phil Trans Roy Sox B, 363:3955-3964, 2008. The Reading Evolutionary Biology Group has also made its own program (BayesPhylogenies) available here; this "allows a range of models of gene sequence evolution, models for morphological traits, models for rooted trees, gamma and beta distributed rate-heterogeneity, and implements a 'mixture model' (Pagel and Meade, 2004) that allows the user to fit more than one model of sequence evolution, without partitioning the data", though whether this includes the particular approaches used in the work under discussion, I don't know. ]
latsot said,

February 27, 2009 @ 2:09 am

@Herman: "The BBC website article references the Times article, not any original research paper, so why not start by pointing the finger at that arm of Murdoch's media empire?"

This is an astonishing statement. Uncritically parroting someone else's bad reporting is equally inexcusable. You seem to be suggesting that we should give the BBC a break because it couldn't be bothered to check its facts.

While the academic in this case does seem to be out of his depth, I'd like to support those above who've pointed out that it's quite common for university PR departments and the media to entirely misrepresent a respectable piece of work, making the researchers look foolish. Something like this has happened to me on two occasions. In both cases I spent quite a lot of time with the writers explaining the work and then answering several of their questions by email. I didn't get to see a draft of the articles and when they came out, the content bore no resemblance at all to what we'd talked about.

Fortunately, my subject is far too dull to make the mainstream media, so nobody read the articles ;)
outeast said,

February 27, 2009 @ 4:58 am

[T]he only way that I can make sense of the media treatment of this story is to assume that they all took it as an opportunity to riff in a giddily nonsensical sort of way on what they took to be a foolish and lightweight topic. – Mark Liberman

This seems like a perfect summation of at least four in five science stories generally.
Stephen Jones said,

February 27, 2009 @ 5:08 am

Pagel does also provide a nugget from the school of the University of the Blindingly Obvious, when he says that the words least likely to change are those we use most …. 'Duh".

This is not I believe the first time that Evolutionary Biologists have tried to apply bizarre statistical techniques to Linguistics. The idea of a linguistic 'half-life' seems to come from the Faculty of Advanced Misuse of Metaphor.

I think they're suffering from new toy sindrome. They devise a statistical techique that is useful in a field they do know something about, and then try to apply it to a totally inappropriate field. GIGO.
Stephen Jones said,

February 27, 2009 @ 5:13 am

One of the reasons for the staying power of numerals may be their specifity. There's little room for semantic drift resulting in a word being squeezed out of its space.
A Caveman said,

February 27, 2009 @ 6:06 am

I throw stick dirty guts Pagel.
David Eddyshaw said,

February 27, 2009 @ 7:31 am

Ah, found it! I knew I'd read about something similar on LL previously (actually, this statement my be always true …)

http://itre.cis.upenn.edu/~myl/languagelog/archives/000592.html
http://itre.cis.upenn.edu/~myl/languagelog/archives/000210.html
Antony Green said,

February 27, 2009 @ 7:43 am

"50% of the words we use today would be unrecognisable to our ancestors living 2,500 years ago". This is actually true, of course; but then so would the other 50%.
Rick S said,

February 27, 2009 @ 7:51 am

So if I have this right, Pagel's team equated lexical items with genes, applied statistical techniques used to project mutation rates into the past and future, and used a scale factor to adjust for the different evolutionary rates of biology and language. Presumably, then, they'll soon be able to announce a set of lexemes common to humans and genus Pan. Perhaps someone could fund a side study of bovine dialect mootation rates? (Sorry, had to do it!)
Stephen Jones said,

February 27, 2009 @ 7:58 am

And here's the true explanation from Nicolas Lezard in the Guardian. It's 2009, The Language Odyssey
My own theory is that ThamesBlue has actually become self-aware, and, possibly as a result of indignation at being given a stupid name with a capital letter in the middle of it, has allowed its thoughts to turn in a sinister and vengeful direction. This list is simply its stream of consciousness, or perhaps a subtle warning to its operators not to push their luck.
Stephen Jones said,

February 27, 2009 @ 8:20 am

The Reading University Press Release is so inept it even attributes authorship of the train wreck to the Evironmental Biology Department instead of the Evolutionary Biology Group.
Jonathan said,

February 27, 2009 @ 9:45 am

I don't get how "five" is supposed to have remained unchanged since the stone age. It shares no phonemes with "cinco" or "cinque" or "cinq" or "penta," and only one with the Germa "fünf." For that matter, "I" shares no sounds with "ich" or "io" or "je." Both parties in the interview seems to confuse the concept of HAVING a word for something with having the SAME word for something, when they talk about pointing to oneself and saying "I."

[(myl) There are several different concepts that have gotten badly confused in the media coverage of this story. One is the question of whether word X in language A and word Y in language B are cognate. This concept plays a crucial role in historical linguistics of all sorts. A different question is whether two cognate words are also even roughly equivalent in meaning — they often are not, due to semantic drift or morphological derivation. Yet another question is whether their pronunciations remain close enough to be recognizable by ordinary speakers — and again, regular sound changes may make the relationship of cognates entirely opaque to ordinary speakers and listeners.

Historical linguistics, whether traditional or computational, generally cares only about the question of whether two words are cognate (and in the most reliable forms of analysis, what systematic sound changes connect them). The Reading press release, and the media coverage, focus on mutual comprehension, which depends crucially on the questions that are largely irrelevant to the historical research, whether the traditional form or the sort of thing that Pagel and his collaborators do.]

Is it really true that frequently used words are more stable? That's not obvious to me. More commonly used verbs are more likely to have more irregularities, for example. It always seemed to me that the pressure of usage made words change faster, but I would be glad to be corrected if I am wrong about this.

[(myl) To a first approximation, yes, frequently used words are apparently less likely to change — in the specific sense of undergoing lexical replacement. From the abstract of the 2007 Nature paper:

Among more than 100 Indo-European languages and dialects, the words for some meanings (such as 'tail') evolve rapidly, being expressed across languages by dozens of unrelated words, while others evolve much more slowly—such as the number 'two', for which all Indo-European language speakers use the same related word-form. No general linguistic mechanism has been advanced to explain this striking variation in rates of lexical replacement among meanings. Here we use four large and divergent language corpora (English, Spanish, Russian and Greek) and a comparative database of 200 fundamental vocabulary meanings in 87 Indo-European languages to show that the frequency with which these words are used in modern language predicts their rate of replacement over thousands of years of Indo-European language evolution. Across all 200 meanings, frequently used words evolve at slower rates and infrequently used words evolve more rapidly. This relationship holds separately and identically across parts of speech for each of the four language corpora, and accounts for approximately 50% of the variation in historical rates of lexical replacement.

But as the Reading press release indicates, the 2007 Nature paper also found that part of speech (with perhaps some semantic flavoring) also made a difference:

… numerals evolve the slowest, then nouns, then verbs, then adjectives. Conjunctions and prepositions such as 'and', 'or', 'but' , 'on', 'over' and 'against' evolve the fastest, some as much as 100 times faster than numerals.

(Note that the meaning of evolve here is *only* "undergo lexical replacement", and *not* "undergo significant sound change or semantic change".)

Since conjunctions and preposition tend to be frequent, and also tend to be replaced quite often, you might be misled. But apparently, even within the class of conjunctions and prepositions, commoner examples are more stable. Here's Fig. 3 from the 2007 Nature paper:

(Click on the image for a larger version)

The different colored lines are for a multiple regression including both frequency and part of speech: "Conjunctions (grey) evolve fastest, followed by prepositions (turquoise), adjectives (red), verbs (blue), nouns (green), special adverbs (yellow), pronouns (orange) and numbers (purple)."]
Mark F. said,

February 27, 2009 @ 10:53 am

(myl) "To a first approximation, yes, frequently used words are apparently less likely to change."

My impression is that "change", in the context of this discussion, means "be replaced by a non-cognate word". Is that right?

If so, is it also the case that common words undergo sound change more slowly?
Stephen Jones said,

February 27, 2009 @ 10:54 am

As I said I think the reason numerals rarely change is that they have clear boundaries.

Are the conjunctions and prepositions that 'evolve' a hundred times faster cognates?

And couldn't it be that verbs 'evolve' faster than nouns partly because they are more subject to change through grammaticalization.

Talking about evolution when you mean extinction doesn't seem that clever.
Chud said,

February 27, 2009 @ 11:20 am

Why didn't the BBC lady ask the question: "Well, if a caveman could understand "'I', 'we', 'two', 'three', and 'five', why can't a Spaniard?"
marie-lucie said,

February 27, 2009 @ 11:20 am

(myl) "To a first approximation, yes, frequently used words are apparently less likely to change."

– My impression is that "change", in the context of this discussion, means "be replaced by a non-cognate word". Is that right?

Yes (at least I think that is what myl said, in accordance with what is known of language change).

If so, is it also the case that common words undergo sound change more slowly?

No. That's the fallacy of the Proto-World (and similar) people, who think that some words somehow are preserved from change, when they just happen to contain sounds which are less likely than others to change (at least in the particular language family). If anything, some extremely frequent words or word sequences can change faster because they are often spoken very fast: witness Spanish "Usted" from the honorific "vuestra merced" (eg Sancho Panza always addresses Don Quixote as "vuesa merced"), while the individual words "vuestra" and "merced" are still in use in their normal forms in other contexts.

There is an exception: baby talk (words used only in speaking to babies), which has remarkable resilience over the centuries but hardly counts as normal language.
Nigel Greenwood said,

February 27, 2009 @ 11:40 am

Shifting the focus away from IE for a moment, it's striking that in the Finno-Ugric group the Finnish & Hungarian words for the numerals 7 to 10 are no longer cognate (ie have "evolved" in the sense of the Nature article). Finnish & Hungarian are described by Karlsson in his Finnish Grammar as being about as far apart "as English or German is from Persian" — though in this respect at least they are less similar.
Mark P said,

February 27, 2009 @ 1:21 pm

"Unless the writers and editors involved are all uneducated morons, which I very much doubt …"

You may presume too much. My experience is that journalists, at least in the US, are among the least well educated of all professionals. And there is the laziness factor as well.

Of course not all are like that, but I think if you sample randomly you will usually find it to be true.
marie-lucie said,

February 27, 2009 @ 2:03 pm

NG: in the Finno-Ugric group the Finnish & Hungarian words for the numerals 7 to 10 are no longer cognate

What do you mean? words cannot stop being cognate any more than your cousin can stop being your cousin. Or do you mean that they have been replaced by other, unrelated words? And is there evidence that the languages did use cognate numerals at one time?
Yoram said,

February 27, 2009 @ 2:44 pm

As far as I can tell, caveman language is surprisingly close to modern English, though with aberrant phonology.

(That's George Booth, in the New Yorker)
Yoram said,

February 27, 2009 @ 2:44 pm

I meant to embed
http://www.mmh.seesart.com/wp-content/uploads/2007/02/ipgissagul.jpg
Stephen Jones said,

February 27, 2009 @ 3:11 pm

Another topical take on caveman language.
News Item: The Oldest English Words Identified

The Fred referred to is Sir Fred Goodwin, who drove the National Bank of Scotland into the largest bankruptcy in UK financial history. It was bailed out by the taxpayer of course. And people are up in arms he's retiring at 50 with a £25 Milion pension fund.
Irene said,

February 27, 2009 @ 4:16 pm

Stephen Jones said:

"Another topical take on caveman language.
News Item: The Oldest English Words Identified
The Fred referred to is Sir Fred Goodwin"

I beg to differ. I'm sure they were referring to Fred Flintstone.
Nigel Greenwood said,

February 27, 2009 @ 4:49 pm

@ marie-lucie:
NG: in the Finno-Ugric group the Finnish & Hungarian words for the numerals 7 to 10 are no longer cognate

What do you mean? words cannot stop being cognate any more than your cousin can stop being your cousin. Or do you mean that they have been replaced by other, unrelated words? And is there evidence that the languages did use cognate numerals at one time?

I may not have expressed myself clearly. What I meant was that modern Fi 7 & Hu 7 are not cognate words; neither are Fi 8 & Hu 8. The Fi for 8 & 9 appear to mean something like "10 minus 2" & "10 minus 1" respectively — unlike the Hu equivalents.

The numbers 1 to 6 in these 2 languages, on the other hand, appear to be cognate (eg Fi kolme & Hu harom, both meaning "three"). It's all there in the WikiP link I gave above.

I have no idea whatsoever what the situation was in PFU. The table in the linked article simply has "N/A" for PFU 7 to 10. Perhaps a competent Finno-Ugricist will read this & enlighten us all.
Harry Campbell said,

February 27, 2009 @ 5:16 pm

Sadly, that link is no longer working…(http://www.evolution.reading.ac.uk/WordChanges/)

Yes, the famous Dictionary of Middle Caveman has "been removed temporally". :-)
marie-lucie said,

February 27, 2009 @ 5:18 pm

NG,

OK, and thank you for the Wiki reference. But it is not unusual for languages to be missing some numerals: 1 to 5, and 10, are usually native words, obviously because of using the hands for counting, and 6 and 9 are just one digit more or less than a basic "one hand" or "two hands" number, but 7 and 8 seem to be particularly prone to being left out. In this case the "empty space" for those missing numbers seem to have been filled by borrowings, again not an unusual occurrence.
Nick Z said,

February 27, 2009 @ 8:27 pm

@Jonathan: the most commonly used verbs may have more irregularities precisely because they preserve archaisms/sound changes which have been regularised/undone in less common verbs by analogy – apparently the most common words we just learn as they are and don't worry that they are irregular. The whole question of the influence of frequency on sound change/analogy/lexical replacement is still, I think, very open.
Jonathan said,

February 27, 2009 @ 11:06 pm

Thanks Nick Z (and Mark for the extensive red commentary in my comment). That's the kind of insight I read Language Log for.

I was thinking, too, that certain compounds less frequently used preserve a sound, where the non-compounded word undergoes a sound shift.

In Spanish

"hacer" (from Latin facere) but "satisfacer"

or "maduro" –but "prematuro" keeping the older form of the Latin maturus.

It could be that there is another explanation, like those compound forms being taken from Latin at a later date as "cultismos." I had just assumed that the lesser used forms would be more conservative in some respects but it looks like I had that backwards.
marie-lucie said,

February 28, 2009 @ 12:23 am

Jonathan,

yes, those are "cultismos". There are similar phenomena in French with inherited words and those borrowed later (which may be from exactly the same Latin word). Sound change does not care whether forms are frequent or not (with rare exceptions).
Mary Kuhner said,

February 28, 2009 @ 1:34 am

I work on Bayesian phylogeny algorithms in biology, but their extension to linguistics raises troublesome issues. In particular, it is fairly easy to establish that a gene of 1000 bases (characters) or so is homologous between two species–if you can't establish that at the DNA level, it is often more clearcut at the protein level. But inference that words are homologous must be based on only a very few bits of information, and those bits are less independent than bases in a gene. So the homology problem, which is severe in biology only at the deepest taxonomic levels, seems extremely severe in linguistic phylogenetics.

All of the ling/phy talks I have personally seen used standards of homology that would not be acceptable in biology. Often they risked circularity: "suitable" words for study are ones that do not seem to be borrowed, which means precisely ones that fit the accepted relationship pattern of their languages. But then those words, being selected to fit the accepted relationship, cannot be used to validate that relationship. On the other hand, you can't make a big sample of both "suitable" and "unsuitable" words because it is impossible to establish any homology at all for the "unsuitable" ones.

The other major problem is coordinated change. This can happen in biology and is known to be a problem for biological phylogenies. For example, all boiling-water organisms substitute C and G for some of the A and T in their genome because C-G pairs are more stable at high temperature. These environmentally driven changes repeat independently in multiple boiling-water creatures and tend to make them cluster together on trees even when they are almost surely not closely related.

Coordinated change in language seems harder to deal with in the linguistic realm than in the biological, and more common as well. How many changes can you pack into a little word like "eau" before it becomes impossible to separate the coordinated ones from the (much needed for these methods) independent ones?

The mathematics of these techniques are sound, but like all statistical methods, they need sensible input or nothing sensible can be expected to come out. I'm not personally convinced that we can provide the needed input for linguistic data.

(Disclaimer: I'm a mathematical biologist, and follow linguistics only as an interested amateur.)
Simon Cauchi said,

February 28, 2009 @ 2:08 am

@marie-lucie: There are similar phenomena in French with inherited words and those borrowed later (which may be from exactly the same Latin word).

And indeed in English, as Milton noted: "New presbyter is but old priest writ large."
Aaron Davies said,

February 28, 2009 @ 3:16 am

up for two days, and no jesus sanchis? the mind boggles…
Aaron Davies said,

February 28, 2009 @ 3:20 am

the focus on "water" reminds me of a very silly bit of commentary i heard once, attempting to extract deep philosophical significance from the similarity of "water" and "what" in various languages ("aqua"/"qua", etc.). i think it had something to do with the Flood. (this was coming from someone who thought "galoshes" came from hebrew (possibly "galesh", "slippery", though i'm not sure anymore what the alleged source word was)
Nigel Greenwood said,

February 28, 2009 @ 6:13 am

@ marie-lucie: yes, those are "cultismos". There are similar phenomena in French with inherited words and those borrowed later (which may be from exactly the same Latin word). Sound change does not care whether forms are frequent or not (with rare exceptions).
Being a practical amateur linguist, I'd like to see an example or two! I presume you're referring to forms such as surdité "deafness", as opposed to the common adjective sourd "deaf". Still in the realm of disability, we have cécité "blindness", this time going back to the mainstream Latin word, as opposed to aveugle "blind" (probably from something like Lat ab oculis — though, wouldn't you know, that is controversial).

One area rich in "cultismos" is that of toponymical adjj. Eg ruthénois < Rodez. There are hundreds of these in French — & Spanish too, for that matter: who would have guessed that onubense < Huelva?
marie-lucie said,

February 28, 2009 @ 7:56 am

NG,

There are dozens of such words in French, and the pairs are called doublets. Any history of the French language (whether written in French, English, German or other languages) will give you a representative list. Your example surdité/sourd being a noun/adjective pair is not quite a doublet, and even less cécité/aveugle since these words have different origins (I guess aveugle from ab oculis does not quite follow the Latin to French rules and so must come from a different dialect – if you have another derivation, let me know). The equivalent of Spanish cultismo is mot savant as opposed to mot populaire.

Doublets that are commonly quoted are fragile/frêle, the latter "frail" (an OF borrowing), rigide/raide, the latter "stiff", ministère/métier, the latter "trade, craft", etc. The inherited words are shorter because they have lost some of the Latin medial consonants through well-understood rules of change. When you add pairs from different parts of speech such as noun/adjective or vice-versa, and derivatives, then the number increases even more, as in auditif/ouïe, the latter "(sense of) hearing", or ovin/ouailles, the latter formerly "sheep in a flock" but later "parishioners (in relation to a priest)".

Toponyms (words for places) are usually inherited but the corresponding adjectives (I forget the technical term) are the ones that often still use the Latin equivalents. Your examples should be written the other way: ruthénois is not from Rodez but from the old Latinized form of the name of the town. Similarly, Sp onubense is not from the current name Huelva [welba] but from its ancient name Onoba (through an intermediate stage Oloba), as you can see in any history of Spanish.

I guess the reason for the learned borrowings for "inhabitants" is that local inhabitants are more likely to just say the equivalent of "the people of Rodez", rather than use a specific word, but the Latinized word for the old tribe was preserved in old documents and then re-borrowed into the language when it was written. I use "Latinized" not just Latin because in most cases the Latin words are Gaulish ones with Latin endings, which makes them a good source for reconstituting the Gaulish names. An example is Eburoviciens for the town of Evreux, preserving Gaulish eburovic, which is practically identical to the old name of "York", both of them including Celtic ebur- "boar".
marie-lucie said,

February 28, 2009 @ 8:11 am

Mary Kuhner,

Thank you for your contribution. I am not a biologist but a linguist, and although I am not familiar with the details of the biological examples you quote I share your skepticism about the applicability of biological and other natural science models to historical linguistics. As a general principle, there should be some applicability (shades of Darwin, who was inspired by historical linguistics) but in practice it seems to me that much of the work is being done by natural scientists, not linguists, and at the moment there are too many problems about how to handle the linguistic data, which are not of the same type as biological ones. Again, I can't go into details, but I am uneasy.
James Kabala said,

February 28, 2009 @ 1:39 pm

It's funny how even though computers have become a household product almost as common as the telephone and television, people will still draw back in awe at anything that can be linked with a "supercomputer" (as Bolderson does here).

I don't think Bolderson comes across as particularly stupid on the whole, however. She shows flashes of understanding that these claims are not quite right (e.g., by asking, "Isn't who a more complicated concept than I?"), but Pagel keeps leading her down a garden path by not explaining himself clearly.
Nigel Greenwood said,

February 28, 2009 @ 1:54 pm

@ marie-lucie:
(I guess aveugle from ab oculis does not quite follow the Latin to French rules and so must come from a different dialect – if you have another derivation, let me know).
According to the TLFi the other possible, though less likely, derivation is from albios oculus [sic].

ovin/ouailles, the latter formerly "sheep in a flock" but later "parishioners (in relation to a priest)".
The term "flock" is the usual religious term in English too. Ouailles note?

Toponyms (words for places) are usually inherited but the corresponding adjectives (I forget the technical term)

All I can tell you is that the Spanish term is gentilicio.

Your examples should be written the other way: ruthénois is not from Rodez but from the old Latinized form of the name of the town.
Thanks for pointing this out. Of course I don't think that ruthénois derives diachronically from the word Rodez! I was just being sloppy & using shorthand.
Chad Nilep said,

February 28, 2009 @ 2:08 pm

There is discussion of Pagel et al. 2007 and Atkinson et al. 2008 at Linguistic Anthropology. See also links therein.
marie-lucie said,

February 28, 2009 @ 7:30 pm

NG: about aveugle:

The TLF seems to say that ab oculis is more likely than the other proposed etymology albios oculus, which seems strange but could have led to auveugle, auvieugle and others, which are not attested. On the other hand, simpler forms of aveugle like aveule, avule are attested. So ab oculis still seems to the most likely origin.
Martin Watts said,

March 1, 2009 @ 1:56 pm

Perhaps the original reference to Scrabble was inspired by Douglas Adams' "Hitch-hikers Guide to the Galaxy" where Arthur Dent tries to teach a caveman to play Scrabble.

That caveman played the word "fortytwo".
RBH said,

March 2, 2009 @ 12:23 am

From the comment at 06:39 on Feb 26:

[(myl) I've not sure how well informed Pagel is about issues in historical linguistics, or whether indeed he was thinking about them at all in framing his answers in this case. As I understand it, he's talking about algorithms that estimate probability of lexical replacement as a function of elapsed time. Those estimates are not constrained to lie within the likely time depth of the last common ancestor of the languages whose cognate sets are being compared.

What one has here is a distinction similar to that between gene trees and species trees in biological phylogenetics. For a beginner's introduction see here. Pagel was apparently constructing what amount to gene trees (lexical elements), not species trees (languages).
Quid plura? | "You were there at the turnstiles, with the wind at your heels..." said,

March 2, 2009 @ 1:58 am

[…] When the BBC ran a silly story about computer models of English past and future, Got Medieval was there to give it a deserving kick, and Language Log was there to demolish it. […]
Mark Pallen said,

March 3, 2009 @ 1:38 pm

Pagel seems to make a habit of straying into fields outside his own training and expertise and thinking that he can get by on a bit of computing and evolutionary biology and then makes a hash of things by not even grasping the basic assumptions of the field. I am a bacteriologist and the worst experience of my professional life was having to examine one of his PhD students who had been cast by Pagel into the field of bacterial genomics without support from anyone expert in the field. Looks like we are seeing a similar pattern of behaviour here.

BTW, my understanding is that the evolution of quinque from penkwe did not follow a simple descent with modification model, but that there was alliteration between the initial consonant and later "kw". In effect, this is a duplication of one phoneme and deletion of another. Or am I misguided in this?
marie-lucie said,

March 3, 2009 @ 3:23 pm

Mark P,

I am glad to see a comment by a scientist. If someone can make a mess in even a field related to his own, how much more of a mess can he make in a field of which he knows nothing about, such as linguistics.

About quinque < *penkwe, the influence of quatuor must have played a role too. "Deletion" is not the right term here, as it would first have resulted in an intermediate form *enkwe which would have evolved differently. The initial p just got replaced by kw.
Andrew Sihler said,

March 10, 2009 @ 12:04 pm

While most of these comments are blessetly sensible, a few are ever so slightly hair-raising.

Not every proposed etymology is tenable, even ones widely-repeated. "Best of a bad lot" is in my judgement no argument in favor of an etymology. French aveugle is a problem which desperate speculations like "ab oculis" don't come close to solving (it's hard to say whether the syntax or the semantics are more atrocious). And while the derivation of Sp. usted from vuestra merced is standard, I believe it has recently been pretty convincingly overturned, but I don't have a reference. (Anyone?)

Forms of address are often subject to radical phonetic erosion, not necessarily because of any haste in their articulation but in large part because they tend to be on the long side and have no salient inner structure that would provide semantically-based correctives to slurred pronunciation: I would take it as self evident that the loss of nasality in Fr monsieur /məsjø/ indicates that when that happened the form had ceased being understood as "my" anything. Ditto the loss of the final /r/. (On the other hand — there's always an "on the other hand" in historical linguistics — they can be archaic: Italian prence "prince" and re "king" are diachronically peculiar in continuing the otherwise always lost nominative form rather than the predicate form (which does show up in It. principe).)

As for the "five" words, Germanic *f < *kʷ is found in a fair number of words, actually, and there's also one good example of *p < *gʷ (e.g. English sheep < *skēgʷ-, cf. Skt chāga- "goat") and one of *b < *gʷh. No conditioning has been identified despite much searching; more likely is that there was an otherwise unattested "p-dialect" of Germanic. As for Lat. quīnque < *penkʷe, the change of *p…kʷ to *kʷ…kʷ is actually a totally regular sound law, shared by all Italic and Celtic languages. The wonky thing about quīnque is the length of the vowel, though that's easily accounted for by reference to the ordinal quīn(c)tus, where the lengthening is regular. (Ordinals often influence the forms of cardinals, which may seem counter-intuitive but it's a widely-encountered phenomenon. Such levelings are pervasive in Slavic, and probably accounts for the loss of the *ð in OE féower "four".)

Apologies for not italicizing cited forms; I don't seem to be getting italics today. Oh, and a final thought: a non-linguist friend of mine first brought the Pagel stuff to my attention. I didn't know what to think. Some nonsense is too preposterous to even grasp, I guess what people mean when they call something "not even wrong".

RSS feed for comments on this post

Scrabble tips for time travelers?

84 Comments

David Eddyshaw said,

Keith Clarke said,

David Eddyshaw said,

Lazar said,

Cecily said,

Mark Liberman said,

Faldone said,

Rachael said,

dw said,

Jan said,

parvomagnus said,

parvomagnus said,

montyfood said,

möngke said,

Peter Christian said,

David Eddyshaw said,

Arnold Zwicky said,

David Eddyshaw said,

Herman said,

Kilfenora said,

Herman said,

parvomagnus said,

David Eddyshaw said,

David Eddyshaw said,

Rubrick said,

Harry Campbell said,

acilius said,

Alex Whiteside said,

Harry Campbell said,

David Eddyshaw said,

Nathan Myers said,

Mark Liberman said,

David Eddyshaw said,

Harry Campbell said,

Victor Mair said,

David Eddyshaw said,

David Eddyshaw said,

Brett said,

David Eddyshaw said,

latsot said,

outeast said,

Stephen Jones said,

Stephen Jones said,

A Caveman said,

David Eddyshaw said,

Antony Green said,

Rick S said,

Stephen Jones said,

Stephen Jones said,

Jonathan said,

Mark F. said,

Stephen Jones said,

Chud said,

marie-lucie said,

Nigel Greenwood said,

Mark P said,

marie-lucie said,

Yoram said,

Yoram said,

Stephen Jones said,

Irene said,

Nigel Greenwood said,

Harry Campbell said,

marie-lucie said,

Nick Z said,

Jonathan said,

marie-lucie said,

Mary Kuhner said,

Simon Cauchi said,

Aaron Davies said,

Aaron Davies said,

Nigel Greenwood said,

marie-lucie said,

marie-lucie said,

James Kabala said,

Nigel Greenwood said,

Chad Nilep said,