[Update — apparently the data for the graphs presented by Sabeti and Miller came originally (without attribution) from work by David Rozado, who has provided useful information about his sources and methods. I therefore withdraw the suggestion that the counts were wrong, pending further study, though I am still not persuaded by the arguments that Sabeti and Miller used their version of his graphs to make.]
This is the subgraph for "racism" from the display originally presented in John F Miller's 2019 tweet, reproduced a few days ago by Arram Sabeti, and allegedly representing "New York Times Word Usage Frequency (1970 to 2018)":

Earlier today ("Sabeti on NYT bias"), I lodged some objections to Miller's graphs, especially the way that the y-axis scaling misrepresents the relative frequency of the various words and phrases covered. But after looking into things a little further, I find that it's not just a scaling problem — the underlying number sequences in Miller's graphs are substantially different from what I find in a search of the NYT archive, at least in the cases that I've checked. I don't know whether this is because of some issue with Miller's numbers, or with the counts from the NYT archive, or what. But for whatever reason, Miller's numbers are (in all cases where I've checked) seriously at variance with the results of NYT archive search.
And the differences make a difference — Miller's tendentious conclusion that "social liberal media and academia are wilfully gaslighting people" is even less well supported by the Archive's numbers than it was by the original misleadingly-scaled graphs.
Read the rest of this entry »