Yesterday, I tried replicating one of the experiments in Jean M. Twenge et al., "Increases in Individualistic Words and Phrases in American Books, 1960–2008", PLoS One 7/10/2012, and got results that seem to be significantly at variance with their conclusions ("Textual narcissism", 7/13/2012).
This morning, I thought I'd try getting a replication with word counts from a different source of historical data. I used the Corpus of Historical American English (Mark Davies, The Corpus of Historical American English: 400 million words, 1810-2009., 2010). Some of the problems with the Google Books source are removed here: the COHA collection is balanced by genre, and a detailed list of its 107,000 sources is available.
And the results remain hard to square with Twenge et al.'s main conclusion, which they expressed like this:
This study demonstrates that language use in books reflects increasing individualism in the U.S. since 1960. Language use in books reflects the larger cultural ethos, and that ethos has been increasingly characterized by a focus on the self and uniqueness.
On the contrary, the time-series of changes in the frequency of the words in their "individualistic" and "communal" lists suggests a very different narrative, one that sees an increasing focus on the group through the first half of the 20th century, and little change since then.
Here's the decade-by-decade of sum of word counts from COHA, for Twenge et al.'s "communal" and "individualistic" lists. The "communal" word counts are the blue line with the 'C' plotting characters; and the "individualistic" word counts are the red line with the 'I' plotting character:
It's perhaps clearer to look at decade-by-decade changes in the ratio between the summed counts of the two sets. The last five points, plotted in red, are the decades from 1960 to 2009 — it's really hard for me to see how to make this support an argument that "the larger cultural ethos […] has been increasingly characterized by a focus on the self and uniqueness".
For comparison, the same type of plots using decade-by-decade data from the Google Book American English collection are below, as presented in yesterday's post (where you can find the data and code). First the historical word counts:
And the ratios, where again we see an apparent rise in "communal" words through the middle of the 20th century, and not a lot of change since then. Certainly there's not much of a trend in the last five points, representing the decades since 1960:
None of this fixes problems that some may see in the basic method. But there's not a lot of point in debating the methodology, if the resulting numbers don't support the conclusion in any event.
Meanwhile, there's been surprisingly little uptake in the mass media for the amiable hypothesis that kids today are self-obsessed etc. There's Sharon Jayson, "What's on Americans' mind? Increasingly, 'me'", USA Today 7/10/2012 — illustrated with the image and caption below:
And also Andrea Johnson, "Kids are hearing more 'me first' than 'united we stand' in the books they read", Minot Daily News 7/10/2012; Tom Jacobs, "Books Increasingly Show It's All About Me", Pacific Standard 7/11/2012.
But so far, neither David Brooks nor the Daily Mail has taken the bait.