More on trends in the Google ngrams corpus

« previous post | next post »

In "Lexico-cultural decay?", 10/9/2018, I called into question Jonathan Merritt's evidence for the view that "most of the central terms in the Christian vocabulary are rapidly declining". Merritt cites Kesebir & Kesebir 2012, who argue on the basis of Google ngram-viewer data that

Study 1 showed a decline in the use of general moral terms such as virtue, decency and conscience, throughout the twentieth century. In Study 2, we examined the appearance frequency of 50 virtue words (e.g. honesty, patience, compassion) and found a significant decline for 74% of them.

I explained several reasons why unigram frequencies for many ordinary words in the Google ngram dataset tend to show a decline over the 20th century, citing Pechinick et al. 2015 and giving some illustrative examples. It occurred to me this morning that there's a different way to illustrate one of the issues, namely the changing mix of types of books in Google's collection. At some point after 2000, that collection shifts fairly abruptly — the earlier material is based on scans of books from cooperating research libraries, while the later material is based on digital texts provided by publishers. This shift produces such a pronounced change in the frequency of nearly all words that the default ngram viewer stops in the year 2000.

But you can ask the viewer to give you data up to 2008 (as far as it's willing to go), and the results almost always show a pronounced change. So I tried it for the items underlying Merritt's argument.

Here are the 6 words in Kesebir & Kesebir's abstract:

(As usual, I've introduced multipliers to get the various word-frequency estimates into the same range.)

And here are the 8 words from Kesebir & Kesebir that Merritt cites as evidence of the decline of "God talk":

Every single one of these 14 poster-child examples for the decline of "moral virtue character and virtue" (Kesebir & Kesebir) and "sacred speech" (Merritt) actually rises in frequency over the last few years of the Google ngrams dataset. Is this because the first decade of the 20th century saw a new Great Awakening? I doubt it — I'm pretty sure that those graphs just reflect a changing mix of publications in the underlying collection.

And this pattern is a serious problem for Merritt's argument. Either his personal impression that "sacred speech and spiritual conversation are in decline" is wrong, or his reliance on Google ngram data to show a similar trend across the 20th century is wrong.

Different American subcultures have always had very different norms and trends in talk (and writing) about spiritual issues. These are worth exploring — and have been explored from many different perspectives over the years. But Jonathan Merritt seems to be trying to project his own cultural and geographical journey over the past decade onto the past century of American life.

"Lexico-cultural decay", 10/9/2018
"Lexical orientation", 10/12/2018
"Why it's harder for him to 'speak God'", 10/14/2018

[Note: I'm not arguing that Google ngram frequencies are worthless in studies of "culturomics", just that (like all other evidence) they need to be interpreted carefully and with appropriate control comparisons.]


1 Comment

  1. Lillie Dremeaux said,

    October 18, 2018 @ 6:48 am

    People also have other terms for these concepts that happen to be in more frequent use right now. "Ethics" and "ethical" instead of "morality" and "moral," for example. Being a "good person" instead of having "decency." "Values" rather than "virtue." Maybe this points to the secularization of what were once mostly spiritual concepts — a shift in what determines right and wrong.

    [(myl) A good point:

    But all such shifts have to be examined within geographical, social, cultural, and ethnic contexts as well.]

RSS feed for comments on this post