« previous post | next post »

From Jack Grieve, a map of the distribution of the word the on Twitter:

There's lots more geolexicography of common function words on Jack's Twitter feed.

Jamie Pennebaker has been telling us for decades that the distribution of such words varies with style, register, personality, and mood. And now Jack Grieve is providing evidence that geography has a surprisingly strong influence.

One of my favorites is and vs. but:

And should is also suggestive:

Is this just another form of geo-indexical variation, like soda vs. pop vs. coke, without any meaning beyond affinity to the group that's indexed by your choice? Or do words like the vary geographically because style, register, personality, and mood also vary in correlated ways?

Jack thinks that it's "style":

"Style" potentially covers a lot of ground, but Lyle Ungar and Marty Seligman might suggest that geographical variation in such words represents the Pennebaker-ish effects of personality or mood or whatever, rather than the kind of "style" that just indexes group membership, or local conventions about the degree formality appropriate in a given context. Still, presumably even (or especially?) the distribution of words like the is subject to purely indexical variation, or to the influence of latent "stylistic" variables of the type that Doug Biber has studied. So it seems to me that the demonstration of such patterns of variation across time and space raises questions more than it answers them. (Which is a Good Thing!)

Jack also sent along a comment on the secular decline in the usage that was documented in 'The case of the disappearing determiners" (1/3/2016): "Assuming the data is all consistent, I'm almost certain this basically represents a decrease in NP complexity/rise in pronoun+VP usage."

[It's important to note that the maps do not represent raw location-by-location frequency data, but rather the result of processing described in Jack Grieve et al., "A statistical method for the identification and aggregation of regional linguistic variation", Language Variation and Change 2011.]


  1. Doctor Science said,

    January 27, 2016 @ 9:25 am

    Do the "And" results include "+" and "&", I wonder?

  2. FM said,

    January 27, 2016 @ 9:54 am

    Mostly complementary, but it looks like New York City just hates conjunctions!

    [(myl) That's probably to some extent an artefact of being on the boundary…]

  3. Doctor Science said,

    January 27, 2016 @ 10:01 am

    I asked Grieve, and he said "+" and "&" aren't included in "and".

  4. Jack Grieve said,

    January 28, 2016 @ 3:31 am

    Thanks for posting this Mark!

  5. D.O. said,

    January 28, 2016 @ 7:37 pm

    I didn't think it right away and even stranger that Prof. Liberman didn't mention it, but the first thing to check is age and gender distribution of twitterers in different regions. It might explain a good chunk of the variation.

  6. Keith Ivey said,

    January 29, 2016 @ 8:56 am

    Prevalence of "the" on Twitter likely correlates negatively with amount of experience with Twitter. Most experienced tweeters may develop a more compact style, with fewer articles, to get more into 140 characters.

RSS feed for comments on this post