Sabeti on NYT bias

« previous post | next post »

Barbara Partee asked me to comment on this thread by Arram Sabeti — crucial bit here:


Here are my comments, copied from her Facebook post, and slightly edited:

It's not easy to evaluate Sabeti's NYT-bashing tweet thread. He doesn't tell us where the cited word-frequency data comes from, or how it was calculated, or even what the graphs' axes mean. And the y-axis in each case goes from 0 to 1, apparently scaled to correspond to the minimum and maximum frequency of each word or phrase. (Presumably this is the minimum and maximum of each word's relative frequency, not the each word's raw counts — since the overall NYT word count has increased considerably over the time period considered…)

So a value of 1 might correspond to an arbitrarily low (or high) frequency, different for each case. [Graphic here — if anyone knows who calculated this, and how, or any other details, please let me know.]

And the choice of words and phrases to display suggests a strong political motivation: The first row is all about feminism; the second row is all about racism; the third row is all about LGBTQ issues; and so on.

Also, Sabeti doesn't compare the results to similar data from other sources. Would any U.S. news-text source show a similar profile over the period from 1970 to 2018, i.e. increasing frequency of words like "sexist" and "racist" and "transgender" over the past decade or so, reaching a recent peak? I imagine so — and remember that the y-axis of the plots shows changes in each word against itself, not against other words.

We also don't know whether this data comes from news stories or from opinion pieces or from features or from on-line comments (once those became available) or what. (And that matters — the distributions of these words are very different in new stories, opinion pieces, book or media reviews, etc.) Controlling for that question, how would these graphs compare to similar data from the AP Newswire or the Wall Street Journal or the Chicago Tribune or some general news accumulator like Nexis? Or Twitter, for that matter (since 2007, of course)?

And I note again that the choice of words and phrases to display clearly reflects the source's own political or culture-war motivations. This is not to say that the NYT is or has ever been an unbiased source. But this graph doesn't seem to me to show anything at all about that question.

Update — as an example of the bias intrinsic in Sabeti's graphic (or whoever's graphic it is, since it might have come to him from some random discussion group or subreddit), let's take a closer look at two of the subgraphs. (Sorry for the poor resolution, I'll be happy to fix it if someone can point me to the underlying data).

We're obviously intended to infer from this comparison that the NYT is replacing a focus on religion with a focus on (anti-male) feminism:

"church" "toxic masculinity"

But the actual count of stories containing these words in the NYT during the month of July 2020 was

Word or Phrase Number of Stories
"church" 367
"toxic masculinity" 3

For the full years of 2017 and 2018 — the last two years of Sabeti's graphs:

Word or Phrase Number of Stories (2017) Number of Stories (2018)
"church" 3172 3674
"toxic masculinity" 21 64

So there are no doubt some trends here, but whether they actually support Sabeti's thesis is entirely unclear. Would you conclude from his graphs that in 2018 the NYT had 57 times more stories containing the word "church" than containing the phrase "toxic masculinity"? (And so far in 2020, the score is 2,165 for "church" vs. 23 for "toxic masculinity", or a ratio of 94 to 1.)

One more small set of tallies, comparing the NYT with the Washington Post, the London Times, and Google Scholar, in terms of article counts over the past 12 months (according to their website search results):

Source # "church" # "Toxic masculinity" Ratio
New York Times 3342 53 73.7
Washington Post 4535 58 78.2
London Times 3892 46 85.6
Google Scholar
(2019 and 2020)
39300 2880 13.6

And for what (little) it's worth, a Google site search of reddit.com (presumably across all dates since 2007 or so) produces 5830000/117000 = 49.8 times more hits for "church".  In comparison, the NYT archive search since 2007 gives us 57702/191 = 302.1 times more hits for "church".

So I'm not finding much evidence for the view that the NYT is straying very far from the Zeitgeist, at least on this particular dimension. And I'd be surprised to find that the other words on Sabeti's page of graphs show a very different pattern.

Overall, in my opinion, Sabeti's tweet thread is a good argument for the principle that quantitative claims ought to be backed up by access to the underlying data and the methods (and indeed the code) used to produce the cited graphs, tables, and so on.

[Update: the data behind those graphs, and perhaps the graphs themselves, come originally from work by David Rozado, who has offered useful information about his sources and methods. More on this when I've had a chance to work through the details.]</font>

 



7 Comments

  1. Adrian Bailey said,

    August 2, 2020 @ 9:49 am

    Graph image taken from this thread https://twitter.com/JohnFMiller86/status/1147130392555610112?s=19

  2. Adrian Bailey said,

    August 2, 2020 @ 9:52 am

    John Miller comments "All of the data compiled in this thread is from a range of different sources. That data comes from LexisNexis (requires a subscription). The purpose of the NY Times graph is to show its increased focus on a very specific subset of ideological rhetoric."

    [(myl) Thanks! Do you have a way to contact John Miller to get the original numbers (actual word counts)?

    I'm a bit puzzled about the LexisNexis reference — Penn's library has a subscription to "Nexis Uni", but it seems to index the NYT only back to 1980, not 1970, and the results e.g. for "church" don't look anything like Miller's:


    ]

  3. Adrian Bailey said,

    August 2, 2020 @ 9:56 am

    Arram has also quoted this tweet https://twitter.com/ZachG932/status/1288265290330058752?s=19

    [(myl) Interesting — I know Peter Dodds at the Vermont lab, so I'll ask him what he can tell us about the data source(s).]

  4. Ben Zimmer said,

    August 2, 2020 @ 12:11 pm

    See also Zach Goldberg's Twitter thread from last year, with more graphs derived from Nexis searches.

  5. arthur said,

    August 2, 2020 @ 2:02 pm

    The New York Times wasn't especially interested in covering churches in the 1975-76 period, where there's a big spike, but it gave a huge amount of coverage to the Church Committee those years.

    [(myl) Good point — except that the actual time series is strikingly different from the one that Miller presents (and Sabeti copies without attribution):


    ]

  6. fev said,

    August 2, 2020 @ 2:29 pm

    The comment about NYT stock price is also unmoored from reality — in summer 2014 it had more than tripled from its low in 2009, and dividends had resumed in fall 2013. It seems a good thing readers were spared his opinion.

  7. David Rozado said,

    August 3, 2020 @ 7:59 am

    Please see my response about the accuracy of the data at
    https://languagelog.ldc.upenn.edu/nll/?p=47954

    It is misleading to crop the figure selectively to artificially bring together charts that have nothing to do with each other such as "church" and "toxic masculinity". The bottom row of the original figure is intended only to show that the frequency counts are reliable by displaying trends that most reasonable people would agree that exist in our culture: one can clearly see three spikes around the times of the three big wars in which the United States has been involved in the analyzed timeframe: Vietnam, the first Gulf War and the 2002/2003 Iraq and Afghanistan wars, the peak of the AIDS epidemic, the ascendance of China as a global superpower, the decline of General Motors, etc. and yes, "church" has indeed become less relevant in America over the past 50 years. There is nothing controversial or political about that. It is just an observation of a fact. Yet, Mr. Liebermann assumes that I am trying to show that this is somehow connected to toxic masculinity!!??

    Just to prevent further misinterpretations. The figure only shows what could be interpreted (there might be alternative interpretations) as a change in moral culture (I'm not sure if this is the right term to describe the phenomena but I can't think of anything better) with an increasing prevalence of prejudice related words and victimization words in media discourse (which is not circumscribed to the NYT by the way since it's also apparent in other outlets I am analyzing, I just happened to look into the NYT first). The bottom row of the figure is just trying to illustrate that the frequency counts are solid and are able to track historical events and societal trends over time.

RSS feed for comments on this post