An important new paper (Sean Roberts & James Winters, "Linguistic Diversity and Traffic Accidents: Lessons from Statistical Studies of Cultural Traits", PLOS ONE 2013, is explained clearly in a blog post by one of the authors, "Uncovering spurious correlations between language and culture", a replicated typo 8/15/2013:
James and I have a new paper out in PLOS ONE where we demonstrate a whole host of unexpected correlations between cultural features. These include acacia trees and linguistic tone, morphology and siestas, and traffic accidents and linguistic diversity.
We hope it will be a touchstone for discussing the problems with analysing cross-cultural statistics, and a warning not to take all correlations at face value. It’s becoming increasingly important to understand these issues, both for researchers as more data becomes available, and for the general public as they read more about these kinds of study in the media (e.g. recent coverage in National Geographic, the BBC and TED).
One of my favorite bits is the following (and not only, or even mainly, because they link to one of my posts!):
Everyone knows that correlation does not imply causation, but there are other problems inherent in studies of cultural features. One problem that is often discounted in these kinds of study is the historical relationship between cultures. Cultural features tend to diffuse in bundles, inflating the apparent links between causally unrelated features. This means that it’s not a good idea to count cultures or languages as independent from each other. […]
Our paper tries to demonstrate the importance of controlling for this problem by pointing out a chain of statistically significant links, some of which are unlikely to be causal. The diagram below shows the links, those marked with ‘Results’ are links that we’ve discovered and demonstrate in the paper.
For instance, linguistic diversity is correlated with the number of traffic accidents in a country, even controlling for population size, population density, GDP and latitude. While there may be hidden causes, such as state cohesion, it would be a mistake to take this as evidence that linguistic diversity caused traffic accidents.