Dialect geography and social networks

There are a variety of factors that are believed to be involved in the establishment and maintenance of the language varieties that are commonly called "dialects". Among these are substrate or contact influences, patterns of initial settlement, group identity, and patterns of communication. Some of these factors, such as settlement patterns, mainly re-distribute existing variation in geographical and social space. But others, such as patterns of communication, affect the way that innovations arise and spread.

The rise of internet-based social media offers new pictures of such patterns of communication, and a few months ago, I came across an interesting analysis of the geography of Facebook friend links: Pete Warden, "How to split up the U.S.", 2/6/2010.

Pete wrote:

As I've been digging deeper into the data I've gathered on 210 million public Facebook profiles, I've been fascinated by some of the patterns that have emerged. My latest visualization shows the information by location, with connections drawn between places that share friends. For example, a lot of people in LA have friends in San Francisco, so there's a line between them.

Looking at the network of US cities, it's been remarkable to see how groups of them form clusters, with strong connections locally but few contacts outside the cluster. For example Columbus, OH and Charleston WV are nearby as the crow flies, but share few connections, with Columbus clearly part of the North, and Charleston tied to the South:

Using an unspecified clustering technique, Pete derived this map of Facebook social networks in the continental U.S.:

He observes that

Some of these clusters are intuitive, like the old south, but there's some surprises too, like Missouri, Louisiana and Arkansas having closer ties  to Texas than Georgia.

Compare this to a map from Labov et al., The Atlas of North American English:

Or the much more elaborate map created by Rick Aschmann:

Among many other interesting points of comparison, there's the following systematic difference:

First, most of the geographical patterns of linguistic variation in the eastern half of the country are apparently not very prominent in Facebook friendship networks. This is not surprising, I guess, but it would be nice to know whether those (very real) linguistic boundaries can be found at a finer level of clustering (or perhaps using a different clustering method), or whether they're simply absent in the Facebook demographic.

And second, there are several prominent (and plausible) Facebook clusters in the west that don't correspond to any generally-recognized linguistic regions — especially the coastal-California network that Pete calls "Socalistan", and the Seattle-centered network that he calls "Pacifica". Is this because these areas are too recent to have developed enough shared linguistic features? Or have the linguistic descriptions just not caught up with the social realities? Or again, are Facebook-based social networks just not relevant (enough) to the processes of linguistic innovation and diffusion?

It's also worth mentioning that Warden's "Greater Texas" includes bits of at least three different traditional dialect regions, while excluding large or larger pieces of each them. Again, there's no reason that current communication networks should reflect the residue of historical processes that created these dialect areas — but if the Facebook data is really a good proxy for relevant communication patterns as they now exist (and this is a big if), should we predict a revision of boundaries over the next couple of generations?

[I've put "dialects" in scare quotes because the common understanding of this term implies that the distribution of variable features should partition the population into a modest number of well-defined sets with perfectly correlated values. In traditional terms, it implies that isoglosses — the geographical boundaries of linguistic features — should reliably line up in "bundles".  To start with, isoglosses typically reflect clines of variation — geographically-correlated patterns of variation in probability or magnitude of effect — rather than sharp-edged regional borders. And bundles of such boundaries do occur, but so do many other patterns, including the residue left by the erratic diffusion of different features from different points of origin.

It's also worth noting that the dialect maps I've reproduced here are mainly based on features of pronunciation, and we could look instead at maps based on lexical features. Those maps would be different, but I don't think they'd change the basic observation, which is that the Facebook map merges linguistically-defined regions in the east, and splits them in the west.]


  1. Brett said,

    June 6, 2010 @ 9:20 am

    The networks identities and boundaries were something he apparently set by hand. (He mentions separating Mormonia as a good idea, so it was evidently a judgement call. And separating data into clusters, especially when some clusters may surround others, is a problem that has never found a good computational solution.) So I hesitate to conclude there is much deeper meaning in the final depiction, which likely to be strongly influenced by some preconceived notions. Just looking at the graph, I don't think I would place the cluster boundaries in the same places he did, although I obviously don't have access to the raw data. I suspect, based on some past experiences with connection graphs, that while Stayathomia may have no obvious internal dividing lines, the far ends may be entirely dissimilar.

    [(myl) I wondered whether his "clustering" might have been done by eye. Too bad if so. I wouldn't say that there's no good (automatic) clustering algorithm, but rather that there are many good, which produce different results and even different types of results. It's true that you can't always find one that produces the results you want, but often that's less because the techniques are inadequate, and more because the results you want aren't as strongly implicit in the data as you'd like them to be.]

  2. Thor Lawrence said,

    June 6, 2010 @ 9:37 am

    So much for personal privacy if this researcher has access to 210 million public Facebook profiles. I beg to doubt that each datum was added by hand.

    [(myl) Just to clarify, Brett and I were not talking about whether data points were added by hand. The starting point is a network whose nodes are named geographical places (cities, towns, or whatever) and whose arcs specify the number of Facebook-friend links connecting a given pair of nodes. The question is, how did Pete divide this matrix into regions? There are a number of automatic methods that could be applied — e.g. some variant of this one — but Brett speculated that Pete just eyeballed a map of the pairwise links and drew some boundaries by hand.]

  3. Thor Lawrence said,

    June 6, 2010 @ 9:43 am

    USA-English dialects: does anyone have any data on the distribution of "dove" as the past participle of "to dive" as an accepted usage? Also "fit" rather than "fitted" is another that has turned up in texts I have edited, originating from respected academics. To my UK-cum-intergovernmental English ears, this is bizarre usage.

  4. Michael Friesner said,

    June 6, 2010 @ 11:53 am

    I'm a bit puzzled by the latest post, and in particular the description of the dialects of American English and summary of Labov et al.'s work on the Atlas of North American English.

    [(myl) There was no summary of the ANAE work, just its top-level map of dialect regions.]

    First of all, Labov has attempted to draw links between communication patterns and dialect distribution in his own work. This has not been as fruitful as one might hope, most likely because communication patterns change over time. When the dialects of American English were established, patterns were likely quite different from those today. However, it is a reasonable hypothesis that only in cases of wholescale migration would we begin to see the effects of changes in communication on dialect distirbution.

    [(myl) There's been a lot of fruitful work on micro-scale relationships between social networks and speech patterns, e.g. in Carmen Fought's Chicano English in Context (see for instance this figure from p. 57). But good proxies for macro-scale patterns of communication are not easy to come by. Things like Facebook friend connections offer a new source of information, and are therefore worth looking at as a correlate of patterns of linguistic variation. Geographical clustering is one obvious thing to try.]

    Second of all, I'm not sure if makes sense to treat the geographic dialect patterning in the Eastern United States on the same level as the other dialect divisions. The ANAE distinguishes five major dialect areas in North America: North, Midland, South, West, and Canada. Some researchers associate the Midland dialect with the West, while others associate the Midland dialectally with the South (the "Southeastern Super-region" in ANAE). We know that Philadelphia, and quite possibly the Midland in general, has increasingly aligned itself with the North rather than the South – so it is unsurprising that in analyzing contemporary communication patterns, the Midland seems to pattern with the North. As for the West, it is true that the more recent settlement history might make dialect patterns less apparent than communication patterns, but it is almost important to note that the ANAE focused on urban communities in establishing dialect patterns. It is quite possible that the additional regions noted by Warden would be more identifiable if rural or suburban communities were studied further, and in some cases, these geographic patterns have been identified in linguistic work by researchers focusing on these regions (e.g.; Wassink, Scanlon, Conn and colleagues' work on the Pacific Northwest; Bowie and Di Paolo on Utah English).

    [(myl) This contitutes an argument for one of the options I suggested, namely that the linguistic descriptions have not yet caught up with the social realities (aside from exceptions like those you cite).]

    Third of all, The Atlas of North American English does, in fact, discuss the distribution of lexical items. In terms of traditional lexical isoglosses (farming terms, etc.), studied by researchers in the mid 20th century), these match up quite nicely with the ones found based on phonological features. Newer lexical items, however, show different patterning. But while these words' geographic distribution may in some cases be due to communication patterns, it is just as likely to be due to factors that require only cursory levels of contact, whether due to language contact (take the distribution in North America of "bodega" vs. "dep(anneur)" vs. "convenience store/corner store") or marketing (e.g., the distribution of "bubbler" vs. "water fountain/drinking fountain").

    [(myl) As I said, the ANAE dialect areas are mainly based on pronunciation factors, but maps based on lexical features don't in general change the picture very much, which is that they show a lot more geographical differentiation in the east than in the west. This seems to be at odds with the claim implicit in Warden's map, that the west contains at least three well-defined subareas embedded in a larger matrix. So the question remains, is this structure an illusion created by the particular demographics of Facebook or Warden's reactions to the Facebook connectivity network? Or is it a fact about communication patterns that hasn't (yet) produced linguistic differentiation? Or is a symptom of lack of descriptive attention?]

    Some food for thought, anyway.

  5. mgh said,

    June 6, 2010 @ 11:54 am

    just for reference, the "pop vs soda vs coke" map

    for what it's worth, the northeast and southern california are siblings on the soda map — I'm surprised that NY and LA would not appear as linked on a facebook connectivity plot

  6. Suzanne Kemmer said,

    June 6, 2010 @ 11:56 am

    The density of links might have a lot to do with the 'catchment area' for
    universities, and where the students go afterward for jobs. Facebook is no longer limited to college students, but I think students and former students who made their networks at college are still its largest demographic. It would be interesting to see a comparison graphic for Myspace links (different demographic).

    [Indeed. Another possibility, of course, is that Warden's map is incomplete or even misleading as a picture of the relevant Facebook links.]

  7. Faith said,

    June 6, 2010 @ 12:12 pm

    For some reason, this puts me in mind of the gefilte fish isopleth.

  8. John Cowan said,

    June 6, 2010 @ 2:02 pm

    People in the northeast and north central regions may speak differently, but for the most part they don't seem to me to be really conscious of it in a way that would create social barriers. (New York may be an exception here, but even a modern New York accent has only a few of the traditionally stigmatized features.)

    [(myl) The idea is not that inguistic divergence creates social barriers, but rather that social networks create accent accomodation and thus eventually convergence.]

  9. Jarek Weckwerth said,

    June 6, 2010 @ 4:18 pm

    I'm not on Facebook, so my experience is limited, but I would imagine it's mainly non-audio… And the current consensus seems to be that regular (face-to-face) spoken interaction is needed for accent traits to spead. (Suggestions to the effect of Aussie soap operas being responsible for the spread of HRT in England etc. notwithstanding.) So finding differences between the maps should not be surprising, as myl rightly notices.

    Also, some aspects of Warden's map look suspicious. There are quite a few connections between e.g. what seems to be Chicago and Atlanta; and between LA and the East Coast. It would indeed be good to know what, if any, "real" clustering was done. Do the connections between cities within a region trump those between top-level regional centres?

    (One surprise is that there's only a weak link between Seattle and NYC…)

  10. Jarek Weckwerth said,

    June 6, 2010 @ 4:21 pm

    BTW, thanks for the pointer to Rick Aschmann's map. A real gem. That's what I call a committed hobbyist. Or should I say "hobbyist".

  11. Adrian Bailey (UK) said,

    June 6, 2010 @ 4:44 pm

    I take it Pete Warden isn't a geographer.

  12. marie-lucie said,

    June 6, 2010 @ 4:47 pm

    Are linguists the only people who communicate through facebook with far-flung colleagues across countries and continents? Or are we just too few to be statistically significant?

  13. Jake K said,

    June 6, 2010 @ 5:27 pm

    I think this actually shows how limited Labov's Atlas of North American English is. Being from the west coast, I obviously have some kind of bias, but I would argue against the notion that the east coast has more dialect variation than the west coast. When I first saw the map, I was awestruck that the entire western half of the United States was lumped together in just one dialect category while New York City was in its own league; I couldn't help but think that the fact that the researchers were from the east coast had something to do with it. I've also read a study by some linguists at UC Santa Barbara (I lost the article and can't remember the names of the researchers, but someone might know what I'm talking about) that identified at least 20 dialects within California alone. While I am a huge admirer of Labov's work, I also have to say that there is more linguistic variation in the west coast than he identifies (listening to speakers from southern California read "Comma Gets a Cure" supports this argument), and if we're going with the assumption that there is some correlation between social networking and dialect variation (which is a pretty big assumption), this could show that the western half of the United States has more variation than the Atlas of North American English leads us to believe.

  14. James C. said,

    June 6, 2010 @ 6:16 pm

    @Marie-Lucie: Pretty sure there are a lot of people with far-flung colleagues throughout academia, but consider that the majority of academic Facebook users are undergraduates who have not developed such long distance professional relationships. Also I think that people who leave academia after a baccalaureate are unlikely to develop such relationships for similar reasons.

    Rick Aschmann’s map for Alaska, and for the surrounding parts of Canada as well, is somewhat imaginative in my humble opinion. Although he indicates the quality of his data by the various population centers marked on the map, the presentation is still misleading. His data apparently derive from two or three politicians, only one of whom I think was actually born and raised there.

    We don’t know anything scientifically about the dialectal features in Alaskan English, there is as far as I know no published linguistic literature on the subject at all. (There is unfortunately a good amount of educational literature about “correcting” local dialectal features, though the presentations are unreliable.) Note that the ANAE doesn’t consider Alaska or northern Canada, despite its title which realistically should be “The Atlas of Southern North American English”. Researchers are apparently more interested in the fine details of New England dialectology than they are in even doing telephone surveys of the far north, which is an absolute shame in my opinion.

  15. D.O. said,

    June 6, 2010 @ 8:49 pm

    Jane K., it is possible that as American population center(s) shift South and West, the dialect varieties there are on the rise also. I have before me A History of the English Language by A.C. Baugh printed in 1935. The American dialectal map has the following parts: 1) Eastern New England, 2) Nothern, 3) W. Pennsylvania, 4) Mid Atlantic, 5) NYC, 6) Southern Mountain, 7) Virginia Piedmont, 8) Eastern Caroline, 9) Southern, and 10) General American, which comprises roughly everything West of W87 in the North and W97 in the South.

  16. Rodger C said,

    June 6, 2010 @ 9:06 pm

    For a dialect geography of America based on lexical items, see Craig M. Carver, _American Regional Dialects: A Word Geography_ (Ann Arbor, 1987).

  17. Joe said,

    June 7, 2010 @ 7:31 am

    @Jake K

    You might be talking about Mary Bucholtz's work. Be that as it may, Labov's atlas was never meant to be exhaustive. Nothing as widescale as an atlas of North American could be done with a satisfactory level of sociolinguistic detail without considerable expense and time. I think the intention was to capture sound changes in progress and the structural bases of those changes, rather than a complete sociolinguistic description.

  18. Jim said,

    June 7, 2010 @ 12:47 pm

    Jake – for instance there is no sign on the first map of the island of Okie speech in the Central Valley, the one that Merle Haggard came from. Most of LA should align with the northern Midwest if settlement patterns mean anything, with small islands of NY and other East Coast speech. Likewise "Pacifica" should show dialect ties with the upper Midwest.

  19. Qov said,

    June 7, 2010 @ 5:47 pm

    English is a second or third language for most people born in the far North, so it sort of makes sense not to include it. Plus a telephone survey of the far north would disproportionally get you priests, social workers and nurses, who are rarely locally raised.

    I'm from urban western Canada and for me dove is normal while dived is slightly marked but acceptable; fit is the past tense of putting things into things: the dress fit perfectly, I fit everything into my suitcase; only a suit made to measure is fitted. Sorry Thor.

  20. Kirk Hazen said,

    June 8, 2010 @ 6:52 am

    The connection between Charleston, WV, and Columbus, OH, will be in decades of Appalachian migrants moving to Columbus. I doubt many of them are Facebook users.

  22. Rob Sykes said,

    June 9, 2010 @ 2:13 pm

    And second, there are several prominent (and plausible) Facebook clusters in the west that don't correspond to any generally-recognized linguistic regions

    I find Mormonia especially interesting. While there has never been very detailed work on western varieties of English, the work that's been done in Utah doesn't seem to fit with descriptions of the West. Marianna Di Paolo's work in Utah showed non-differential fronting of /uw/ and /ow/, a reversal of caught and cot and preliminary evidence for a reversal in the high front vowels. My own MA thesis showed that glide weakening/deletion in /ay/ is happening in the Salt Lake Valley among younger speakers, something else which never fit with descriptions of the West.

  23. john riemann soong said,

    June 11, 2010 @ 4:57 pm

    There needs to be some compensation for how easily 15-year-olds will make facebook friendships versus 35-year-olds. I have around 450+ fb friends, but I notice that the average professional will only tend to have 30-150. Not every fb friend has equal influence on your pattern of communication…

    (Personally I don't think this says anything about the differential depth of friendships between generations — just that we're more socially comfortable adding that dormmate we see every day at breakfast.)

    I suppose if privacy weren't an issue, it would be interesting to separate connections by "when they were made" — connections made earlier in life are likely to have more impact on language use, after all.

  25. Jim T said,

    July 27, 2010 @ 6:45 am

    I currently work in Chicago but I'm from South Texas. My boss seems to get a real kick out of my pronunciation of the word "pen".

    We have to go to him for supplies and he always make me repeat myself whenever I ask for one and laughs incessantly. He says that I pronounce the word "pen" is funny. My ignorance must shine through because although I've tried to understand the "sound" difference between "pin" and "pen", I just can't. You write with a "pen", you stick something to the wall with a "pin".

    He states that I say "pin" when I should say "pen". When back home in Texas, when asked for a "pen", I've never given someone a "pin" or the other way around. So I don't understand how he hears a difference.

  26. BillMax said,

    July 27, 2010 @ 7:50 am

    Dialect trivia: I learned some basic sign language in Indianapolis and later moved to northern Kentucky and saw differences in signs, for instance "God" was with a downward motion of the hand in IN and with an upward motion in KY.

