There are a variety of factors that are believed to be involved in the establishment and maintenance of the language varieties that are commonly called "dialects". Among these are substrate or contact influences, patterns of initial settlement, group identity, and patterns of communication. Some of these factors, such as settlement patterns, mainly re-distribute existing variation in geographical and social space. But others, such as patterns of communication, affect the way that innovations arise and spread.
The rise of internet-based social media offers new pictures of such patterns of communication, and a few months ago, I came across an interesting analysis of the geography of Facebook friend links: Pete Warden, "How to split up the U.S.", 2/6/2010.
As I've been digging deeper into the data I've gathered on 210 million public Facebook profiles, I've been fascinated by some of the patterns that have emerged. My latest visualization shows the information by location, with connections drawn between places that share friends. For example, a lot of people in LA have friends in San Francisco, so there's a line between them.
Looking at the network of US cities, it's been remarkable to see how groups of them form clusters, with strong connections locally but few contacts outside the cluster. For example Columbus, OH and Charleston WV are nearby as the crow flies, but share few connections, with Columbus clearly part of the North, and Charleston tied to the South:
Using an unspecified clustering technique, Pete derived this map of Facebook social networks in the continental U.S.:
He observes that
Some of these clusters are intuitive, like the old south, but there's some surprises too, like Missouri, Louisiana and Arkansas having closer ties to Texas than Georgia.
Compare this to a map from Labov et al., The Atlas of North American English:
Or the much more elaborate map created by Rick Aschmann:
Among many other interesting points of comparison, there's the following systematic difference:
First, most of the geographical patterns of linguistic variation in the eastern half of the country are apparently not very prominent in Facebook friendship networks. This is not surprising, I guess, but it would be nice to know whether those (very real) linguistic boundaries can be found at a finer level of clustering (or perhaps using a different clustering method), or whether they're simply absent in the Facebook demographic.
And second, there are several prominent (and plausible) Facebook clusters in the west that don't correspond to any generally-recognized linguistic regions — especially the coastal-California network that Pete calls "Socalistan", and the Seattle-centered network that he calls "Pacifica". Is this because these areas are too recent to have developed enough shared linguistic features? Or have the linguistic descriptions just not caught up with the social realities? Or again, are Facebook-based social networks just not relevant (enough) to the processes of linguistic innovation and diffusion?
It's also worth mentioning that Warden's "Greater Texas" includes bits of at least three different traditional dialect regions, while excluding large or larger pieces of each them. Again, there's no reason that current communication networks should reflect the residue of historical processes that created these dialect areas — but if the Facebook data is really a good proxy for relevant communication patterns as they now exist (and this is a big if), should we predict a revision of boundaries over the next couple of generations?
[I've put "dialects" in scare quotes because the common understanding of this term implies that the distribution of variable features should partition the population into a modest number of well-defined sets with perfectly correlated values. In traditional terms, it implies that isoglosses — the geographical boundaries of linguistic features — should reliably line up in "bundles". To start with, isoglosses typically reflect clines of variation — geographically-correlated patterns of variation in probability or magnitude of effect — rather than sharp-edged regional borders. And bundles of such boundaries do occur, but so do many other patterns, including the residue left by the erratic diffusion of different features from different points of origin.
It's also worth noting that the dialect maps I've reproduced here are mainly based on features of pronunciation, and we could look instead at maps based on lexical features. Those maps would be different, but I don't think they'd change the basic observation, which is that the Facebook map merges linguistically-defined regions in the east, and splits them in the west.]