This post is the promised short discussion of Michael Dunn, Simon J. Greenhill, Stephen C. Levinson & Russell D. Gray, "Evolved structure of language shows lineage-specific trends in word-order universals", Nature, published online 4/13/2011. [Update: free downloadable copies are available here.] As I noted earlier, I recommend the clear and accessible explanation that Simon Greenhill and Russell Gray have put on the Austronesian Database website in Auckland — in fact, if you haven't read that explanation, you should go do so now, because I'm not going to recapitulate what they did and their reasons for doing it, beyond quoting the conclusion:
These family-specific linkages suggest that language structure is not set by innate features of the cognitive language parser (as suggested by the generativists), or by some over-riding concern to "harmonize" word-order (as suggested by the statistical universalists). Instead language structure evolves by exploring alternative ways to construct coherent language systems. Languages are instead the product of cultural evolution, canalized by the systems that have evolved during diversification, so that future states lie in an evolutionary landscape with channels and basins of attraction that are specific to linguistic lineages.
And I should start by saying that I'm neither a syntactician nor a typologist. The charitable way to interpret this is that I don't start with any strong prejudices on the subject of syntactic typology. From this unbiased perspective, it seems to me that this paper adds a good idea that has been missing from most traditional work in syntactic typology, but at the same time, it misses two good ideas that have been extensively developed in the related area of historical syntax.
This paper's good new(-ish) idea is that new languages develop by an evolutionary process from older ones, and therefore the explanation for the current distribution of "states" (here viewed as distributions of eight superficial word-order generalizations) may lie (mainly or entirely) in the "transition probabilities" that characterize the process of change, rather than in a set of constraints on a static view of the state space.
The paper's main result starts with a demonstration — which I find convincing — that these transition probabilities cannot be factored into entirely independent contributions of the eight individual features, but rather have at least two-way interactions, in which the probability of (say) evolving from postpositions to prepositions is different in a language with verb-object order than in a language with object-verb order.
This much is predicted by all extant theories, including those that this paper's authors call "generative" and those that they call "statistical". Even if you believe that the only explanatorily-relevant forces arise from static constraints on synchronic grammars, you'll expect these constraints to drive state-dependent probabilities of change, especially when the states are described in superficial ways. Thus if you believed that languages prefer consistency in the position of grammatical "heads", you would conclude that changes increasing such consistency (e.g. arranging for both verbs and adpositions to precede their objects) should be favored, other things equal, over changes that decrease it.
But this paper's focus on the distribution of state-transitions rather than on the distribution of states, along with its method for quantitative estimation of the transition probabilities, leads the authors to a second result, namely that the correlations among estimated transition probabilities for the eight individual features are different in the four different language families that they studied. On the face of it, this seems to be inconsistent with any theory where the relevant forces — whatever their explanation — are universal.
I'm convinced that this result is interesting, but I'm less certain that it's true. My first concern is that the paper is modeling an implicit transition matrix with 65,536 cells: there are 2^8 = 256 possible configurations of the 8 superficial syntactic features, and therefore 256*256 = 65,536 transition probabilities between language-states. The empirical basis for their effort is large in terms of the effort involved in creating it — evolutionary trees estimated from 589 languages in 4 families (400 Austronesian, 73 Bantu, 82 Indo-European, and 34 Uto-Aztecan) — but small relative to the number of parameters implicit in this problem. They solve this difficulty by transforming the 65,536-dimensional problem into a much lower-dimensional one — the relevant numbers are the 8*8 = 64 pairwise correlations among the 8 superficial syntactic features.
But their conclusion still depends on being able to estimate these 64 correlations separately for each of the 4 lineages. And prior to exploring the methodological background carefully, it's not obviously to me that this can be done accurately-enough for the phylogenetic analysis of 34 nodes (Uto-Aztecan) or 73 nodes (Bantu) or 82 nodes (Indo-European). [The BayesTraits software that they used uses Markov Chain Monte Carlo simulations in a credible way to estimate the relevant correlations and confidence intervals -- but the relationship between the number of parameters and the amount of data still worries me.]
And even if the lineage-wise correlation estimates are entirely correct, there might still be some structure hidden in the 65,536 – 64 = 65,472 ignored dimensions that would motivate an explanation in terms of constraints on synchronic grammars after all. All the same, it's a challenge to synchronic-constraint theories that the correlations in different lineages look so different; and the authors of this paper deserves kudos, not just for the amount of work involved in the analysis, but also for posing the question in this form in the first place.
As for my claim that there are two good ideas from work in historical syntax that are missing from this paper, I don't have time this morning to give more than a brief sketch of what I mean. I'll come back with more detail some other time.
The first point is easy to understand: historical change in syntactic structures, as in all other aspects of language, is often driven by language contact. In the abstract, this is an old and obvious idea, but there are many specific relevant facts and generalizations, some of them recently discovered or elaborated. For a survey, see Sally Thomason's Language Contact: An Introduction.
One reason for bringing this up is that contact between languages is entirely outside the range of models considered by Dunn et al. But if language contact has played an important role — and some people think that it plays a dominant role in syntactic change — then it's likely to create real lineage-specific correlations of the type that Dunn et al. observe, since their linguistic lineages are almost entirely geographically disjoint, and thus subject to disjoint areal phenomena caused by disjoint patterns of contact. I don't see any reason that these areal phenomena should not include correlations among the estimated transition probabilities for their eight superficial syntactic features.
And note that in this sense (and some other senses as well), historical linguists have always understood that language change is driven by forces outside of preference-relations (whether statistical or categorical) among synchronic grammars. The Dunn et al. paper gives the impression that all prior theories of syntactic typology have missed this point. But in fact, a failure to attend to "path effects" and historico-cultural influences has been the exception rather than the rule among linguists. Indeed most linguists have in fact been attentive to some grammar-external effects, such as language contact, that Dunn et al. ignore.
The second missing idea from work in historical syntax is that features like "OBV" (the code for whether objects follow verbs) should be seen as superficial grammatical symptoms rather than atomic grammatical traits. To give a biological analogy, having a determinate genomic variation (like the HbS mutation behind sickle-cell anemia) is a trait whose distribution it makes sense to model using techniques like those used by the BayesTrait software. But having a phenotypic property like being short is much more problematic. Height (though quite heritable) is a complex interaction among many genes and environmental influences. And if your sample includes short people from groups with several quite different reasons for being short, but your model treats shortness as an atomic trait, then your estimates of trait-transition probabilities are likely to be garbage, at least with respect to the underlying science.
If you don't know what this might mean in a syntactic context, and you can't wait for my promised forthcoming discussion, take a look at Anthony Kroch, Ann Taylor, and Donald Ringe, "The Middle English verb-second constraint: a case study in language contact and language change", in Susan Herring, Ed., Textual parameters in older languages, 2000.
[Amazingly enough, there's another interesting new paper out of Auckland on linguistic evolution this week: Quentin Atkinson, "Phonemic Diversity Supports a Serial Founder Effect Model of Language Expansion from Africa", Science 4/15/2011. More on this later...]