Language meets literature; rationality vs. experience; fiction vis-à-vis nonfiction

« previous post | next post »

New article in PNAS (Proceedings of the National Academy of Sciences of the United States of America), "The rise and fall of rationality in language", Marten Scheffer, Ingrid van de Leemput, Els Weinans, and Johan Bollen (12/21/21)

Significance

The post-truth era has taken many by surprise. Here, we use massive language analysis to demonstrate that the rise of fact-free argumentation may perhaps be understood as part of a deeper change. After the year 1850, the use of sentiment-laden words in Google Books declined systematically, while the use of words associated with fact-based argumentation rose steadily. This pattern reversed in the 1980s, and this change accelerated around 2007, when across languages, the frequency of fact-related words dropped while emotion-laden language surged, a trend paralleled by a shift from collectivistic to individualistic language.

Sections

Abstract

The surge of post-truth political argumentation suggests that we are living in a special historical period when it comes to the balance between emotion and reasoning. To explore if this is indeed the case, we analyze language in millions of books covering the period from 1850 to 2019 represented in Google nGram data. We show that the use of words associated with rationality, such as “determine” and “conclusion,” rose systematically after 1850, while words related to human experience such as “feel” and “believe” declined. This pattern reversed over the past decades, paralleled by a shift from a collectivistic to an individualistic focus as reflected, among other things, by the ratio of singular to plural pronouns such as “I”/”we” and “he”/”they.” Interpreting this synchronous sea change in book language remains challenging. However, as we show, the nature of this reversal occurs in fiction as well as nonfiction. Moreover, the pattern of change in the ratio between sentiment and rationality flag words since 1850 also occurs in New York Times articles, suggesting that it is not an artifact of the book corpora we analyzed. Finally, we show that word trends in books parallel trends in corresponding Google search terms, supporting the idea that changes in book language do in part reflect changes in interest. All in all, our results suggest that over the past decades, there has been a marked shift in public interest from the collective to the individual, and from rationality toward emotion.

The post-truth era where “feelings trump facts” (1) may seem special when it comes to the historical balance between emotion and reasoning. However, quantifying this intuitive notion remains difficult as systematic surveys of public sentiment and worldviews do not have a very long history. We address this gap by systematically analyzing word use in millions of books in English and Spanish covering the period from 1850 to 2019 (2). Reading this amount of text would take a single person millennia, but computational analyses of trends in relative word frequencies may hint at aspects of cultural change (24). Print culture is selective and cannot be interpreted as a straightforward reflection of culture in a broader sense (5). Also, the popularity of particular words and phrases in a language can change for many reasons including technological context (e.g., carriage or computer), and the meaning of some words can change profoundly over time (e.g., gay) (6). Nonetheless, across large amounts of words, patterns of change in frequencies may to some degree reflect changes in the way people feel and see the world (24), assuming that concepts that are more abundantly referred to in books in part represent concepts that readers at that time were more interested in. Here, we systematically analyze long-term dynamics in the frequency of the 5,000 most used words in English and Spanish (7) in search of indicators of changing world views. We also analyze patterns in fiction and nonfiction separately. Moreover, we compare patterns for selected key words in other languages to gauge the robustness and generalizability of our results. To see if results might be specific to the corpora of book language we used, we analyzed how word use changed in the New York Times since 1850. In addition, to probe whether changes in the frequency of words used in books does indeed reflect interest in the corresponding concepts we analyzed how change in Google word searches relates to the recent change in words used in books. Following best-practice guidelines (8) we standardized word frequencies by dividing them by the frequency of the word “an,” which is indicative of total text volume, and subsequently taking z-scores (SI Appendix, sections 1, 5, and 8).

VHM:  Under "Figures and SI", I found this Table to be of particular interest:

Contrasting classes of concepts related to a personal (top row) vs. societal view of the world (bottom row) emerge by ranking words according to their correlation with principal components, overall sentiment, and the hockeystick pattern

Words scoring highest on surging PCA axis (PC2):
angry, look, walk, unexpected, sleep, voice, imagine, embarrassed, tortured, heal, struggling, knowing, potion, ambush, incredible, looking, greedy, terrified, looks, how, torture, learn, anger, invisible, mother, comfortable, drunk, fade, like, brutal, harsh, yourself, pain, sofa, could, dream, distracted, crying, what, thanks, her, eat, walking, shower, helmet, warn, suspected, sense, luckily, smell
Words correlating most positively to sentiment:
dressed, nights, beating, mad, forget, perfume, wore, delicious, crowd, dinner, took, sister, whispering, saw, hung, next, shut, bad, together, suddenly, slept, beside, thought, away, stood, another, awake, spoke, alive, drank, me, down, broke, dark, blame, inviting, whisper, drown, too, polite, moment, dragged, life, hang, quietly, forgot, glow, silence, footsteps, surprised
Words declining before 1980 and rising after 1980:
perfect, understood, throw, them, embrace, sight, comfort, nothing, rushing, place, trusting, awful, beautiful, ever, hearts, never, awake, throwing, when, sweet, promise, fallen, threw, cheer, brother, so, spirit, breathe, every, owe, believing, thankful, footsteps, him, rest, stranger, gorgeous, seeing, supposed, ashes, surprised, joy, cheering, disappoint, stood, thrown, dare, who, shine, appetite
Words scoring lowest on surging PCA axis (PC2):
secretary, state, report, year, sec, council, order, authorized, district, west, eastern, behalf, northern, president, office, statement, under, January, vice, attorney, east, committee, resident, October, south, reference, officer, branch, annual, interest, prepared, following, commonwealth, August, counsel, exclusive, further, board, April, collected, November, February, July, jersey, September, jurisdiction, general, contract, permanent, remaining
Words correlating most negatively to sentiment:
deputy, separate, annual, surface, applied, report, joint, contain, sub, marine, effect, determined, counsel, established, foreign, reasonable, congress, qualified, gross, number, direct, violation, assigned, tables, increase, request, section, savings, remaining, temperature, library, permit, construction, funds, reference, chemistry, transportation, manual, provided, volume, capital, chemical, assist, public, member, retarded, demonstration, affected, department, rate
Words rising before 1980 and declining after 1980:
area, program, indicate, available, development, basis, determine, initial, technical, million, addition, final, range, replacement, personnel, control, unit, involved, percent, eliminate, limited, rate, concentration, increase, result, test, staff, included, tested, transfer, maximum, zone, plus, sample, recent, congressman, level, funds, data, responsible, basic, laboratory, equipment, budget, procedure, breakdown, effective, activity, tape, review

VHM:  What does this all boil down to?  The authors review their findings in "Outlook":

It seems unlikely that we will ever be able to accurately quantify the role of different mechanisms driving language change. However, the universal and robust shift that we observe does suggest a historical rearrangement of the balance between collectivism and individualism and—inextricably linked—between the rational and the emotional or framed otherwise. As the market for books, the content of the New York Times, and Google search queries must somehow reflect interest of the public, it seems plausible that the change we find is indeed linked to a change in interest, but does this indeed correspond to a profound change in attitudes and thinking? Clearly, the surge of post-truth discourse does suggest such a shift (4448), and our results are consistent with the interpretation that the post-truth phenomenon is linked to a historical seesaw in the balance between our two fundamental modes of thinking. If true, it may well be impossible to reverse the sea change we signal. Instead, societies may need to find a new balance, explicitly recognizing the importance of intuition and emotion, while at the same time making best use of the much needed power of rationality and science to deal with topics in their full complexity. Striking this balance right is urgent as rational, fact-based approaches may well be essential for maintaining functional democracies and addressing global challenges such as global warming, poverty, and the loss of nature.

It is no wonder that the first three authors are environmental scientists, while the fourth works in informatics.

[h.t. Paul Midler]



3 Comments

  1. Jerry Friedman said,

    December 30, 2021 @ 3:07 pm

    The trends are interesting, to the extent that I understand what the authors did, but I don't see anything more than speculation about a connection between those trends and an alleged transition into post-truthiness. Also, I was interested to see that an increase in a collection of words containing "true" and "truth" is supposed to indicate a move away from concern for the truth. I can see how that might work—maybe people who are sure they know the truth resist correction of their false beliefs—but I wonder if it really works. Maybe the collection of words would be more indicative without those two.

  2. D.O. said,

    December 30, 2021 @ 3:34 pm

    This reeks a bit of amateurism. Why use words as tokens? At least group them under lemmas.

  3. AntC said,

    December 30, 2021 @ 5:09 pm

    … millions of books …

    Wait. What? Haven't they heard of Twitter?

    I'd suspect those most prominent in The surge of post-truth political argumentation haven't opened a book in decades.

    (Also has the 'research' controlled for specific words changing their meaning over that stretch of time? Or for other stylistic trends, such as shorter sentence length.)

RSS feed for comments on this post