{"id":40349,"date":"2018-10-18T06:42:54","date_gmt":"2018-10-18T11:42:54","guid":{"rendered":"http:\/\/languagelog.ldc.upenn.edu\/nll\/?p=40349"},"modified":"2018-10-18T06:52:57","modified_gmt":"2018-10-18T11:52:57","slug":"more-on-trends-in-the-google-ngrams-corpus","status":"publish","type":"post","link":"https:\/\/languagelog.ldc.upenn.edu\/nll\/?p=40349","title":{"rendered":"More on trends in the Google ngrams corpus"},"content":{"rendered":"<p>In \"<a href=\"http:\/\/languagelog.ldc.upenn.edu\/nll\/?p=40222\" target=\"_blank\" rel=\"noopener\">Lexico-cultural decay?<\/a>\", 10\/9\/2018, I called into question Jonathan Merritt's evidence for the <a href=\"http:\/\/theweek.com\/articles\/791795\/death-sacred-speech\" target=\"_blank\" rel=\"noopener\">view<\/a> that \"most of the central terms in the Christian vocabulary are rapidly declining\". Merritt cites <a href=\"https:\/\/www.tandfonline.com\/doi\/abs\/10.1080\/17439760.2012.715182\" target=\"_blank\" rel=\"noopener\">Kesebir &amp; Kesebir 2012<\/a>, who argue on the basis of Google ngram-viewer data that<\/p>\n<p style=\"padding-left: 30px;\"><span style=\"color: #000080;\">Study 1 showed a decline in the use of general moral terms such as <em>virtue, decency<\/em> and <em>conscience<\/em>, throughout the twentieth century. In Study 2, we examined the appearance frequency of 50 virtue words (e.g. <em>honesty, patience, compassion<\/em>) and found a significant decline for 74% of them.<\/span><\/p>\n<p>I explained several reasons why unigram frequencies for many ordinary words in the Google ngram dataset tend to show a decline over the 20th century, citing <a href=\"https:\/\/journals.plos.org\/plosone\/article?id=10.1371\/journal.pone.0137041\" target=\"_blank\" rel=\"noopener\">Pechinick et al. 2015<\/a> and giving some illustrative examples. It occurred to me this morning that there's a different way to illustrate one of the issues, namely the changing mix of types of books in Google's collection. At some point after 2000, that collection shifts fairly abruptly &#8212;\u00a0the earlier material is based on scans of books from cooperating research libraries, while the later material is based on digital texts provided by publishers. This shift produces such a pronounced change in the frequency of nearly all words that the default ngram viewer stops in the year 2000.<\/p>\n<p>But you can ask the viewer to give you data up to 2008 (as far as it's willing to go), and the results almost always show a pronounced change. So I tried it for the items underlying Merritt's argument.<\/p>\n<p><!--more--><\/p>\n<p>Here are the\u00a0<a href=\"https:\/\/books.google.com\/ngrams\/graph?content=virtue%2C%28decency*10%29%2Cconscience%2C%28honesty*5%29%2Cpatience%2C%28compassion*4%29&amp;year_start=1980&amp;year_end=2008&amp;corpus=15&amp;smoothing=3&amp;share=&amp;direct_url=t1%3B%2Cvirtue%3B%2Cc0%3B.t1%3B%2C%28decency%20%2A%2010%29%3B%2Cc0%3B.t1%3B%2Cconscience%3B%2Cc0%3B.t1%3B%2C%28honesty%20%2A%205%29%3B%2Cc0%3B.t1%3B%2Cpatience%3B%2Cc0%3B.t1%3B%2C%28compassion%20%2A%204%29%3B%2Cc0\" target=\"_blank\" rel=\"noopener\">6 words in Kesebir &amp; Kesebir's abstract<\/a>:<\/p>\n<p><a href=\"http:\/\/languagelog.ldc.upenn.edu\/myl\/GNGpost2000_1.png\"><img decoding=\"async\" title=\"Click to embiggen\" src=\"http:\/\/languagelog.ldc.upenn.edu\/myl\/GNGpost2000_1.png\" width=\"490\" \/><\/a><\/p>\n<p>(As usual, I've introduced multipliers to get the various word-frequency estimates into the same range.)<\/p>\n<p>And here are the <a href=\"https:\/\/books.google.com\/ngrams\/graph?content=grace%2Cmercy%2Cwisdom%2C%28faith*0.5%29%2Csacrifice%2C%28honesty*7%29%2Crighteousness%2Cevil&amp;year_start=1980&amp;year_end=2008&amp;corpus=15&amp;smoothing=3&amp;share=&amp;direct_url=t1%3B%2Cgrace%3B%2Cc0%3B.t1%3B%2Cmercy%3B%2Cc0%3B.t1%3B%2Cwisdom%3B%2Cc0%3B.t1%3B%2C%28faith%20%2A%200.5%29%3B%2Cc0%3B.t1%3B%2Csacrifice%3B%2Cc0%3B.t1%3B%2C%28honesty%20%2A%207%29%3B%2Cc0%3B.t1%3B%2Crighteousness%3B%2Cc0%3B.t1%3B%2Cevil%3B%2Cc0\" target=\"_blank\" rel=\"noopener\">8 words from Kesebir &amp; Kesebir that Merritt cites<\/a> as evidence of the decline of \"God talk\":<\/p>\n<p><a href=\"http:\/\/languagelog.ldc.upenn.edu\/myl\/GNGpost2000_2.png\"><img decoding=\"async\" title=\"Click to embiggen\" src=\"http:\/\/languagelog.ldc.upenn.edu\/myl\/GNGpost2000_2.png\" width=\"490\" \/><\/a><\/p>\n<p>Every single one of these 14 poster-child examples for the decline of \"moral virtue character and virtue\" (Kesebir &amp; Kesebir) and \"sacred speech\" (Merritt) actually rises in frequency over the last few years of the Google ngrams dataset. Is this because the first decade of the 20th century saw a new <a href=\"https:\/\/en.wikipedia.org\/wiki\/Great_Awakening\" target=\"_blank\" rel=\"noopener\">Great Awakening<\/a>? I doubt it &#8212; I'm pretty sure that those graphs just reflect a changing mix of publications in the underlying collection.<\/p>\n<p>And this pattern is a serious problem for Merritt's argument. Either his personal impression that \"sacred speech and spiritual conversation are in decline\" is wrong, or his reliance on Google ngram data to show a similar trend across the 20th century is wrong.<\/p>\n<p>Different American subcultures have always had very different norms and trends in talk (and writing) about spiritual issues. These are worth exploring &#8212; and have been explored from many different perspectives over the years. But Jonathan Merritt seems to be trying to project <a href=\"https:\/\/www.salon.com\/2012\/08\/12\/why_i_outed_a_christian_star_2\/\" target=\"_blank\" rel=\"noopener\">his own cultural and geographical journey over the past decade<\/a> onto the past century of American life.<\/p>\n<p>\"<a href=\"http:\/\/languagelog.ldc.upenn.edu\/nll\/?p=40222\" target=\"_blank\" rel=\"noopener\">Lexico-cultural decay<\/a>\", 10\/9\/2018<br \/>\n\"<a href=\"http:\/\/languagelog.ldc.upenn.edu\/nll\/?p=40261\" target=\"_blank\" rel=\"noopener\">Lexical orientation<\/a>\", 10\/12\/2018<br \/>\n\"<a href=\"http:\/\/languagelog.ldc.upenn.edu\/nll\/?p=40325\" target=\"_blank\" rel=\"noopener\">Why it's harder for him to 'speak God'<\/a>\", 10\/14\/2018<\/p>\n<p>[Note: I'm not arguing that Google ngram frequencies are worthless in studies of \"culturomics\", just that (like all other evidence) they need to be interpreted carefully and with appropriate control comparisons.]<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In \"Lexico-cultural decay?\", 10\/9\/2018, I called into question Jonathan Merritt's evidence for the view that \"most of the central terms in the Christian vocabulary are rapidly declining\". Merritt cites Kesebir &amp; Kesebir 2012, who argue on the basis of Google ngram-viewer data that Study 1 showed a decline in the use of general moral terms [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_exactmetrics_skip_tracking":false,"_exactmetrics_sitenote_active":false,"_exactmetrics_sitenote_note":"","_exactmetrics_sitenote_category":0,"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[39],"tags":[],"class_list":["post-40349","post","type-post","status-publish","format-standard","hentry","category-language-and-culture"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=\/wp\/v2\/posts\/40349","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=40349"}],"version-history":[{"count":5,"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=\/wp\/v2\/posts\/40349\/revisions"}],"predecessor-version":[{"id":40354,"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=\/wp\/v2\/posts\/40349\/revisions\/40354"}],"wp:attachment":[{"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=40349"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=40349"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=40349"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}