Which-hunting — and relative decline?

« previous post | next post »

In "A quantitative history of which-hunting", I reproduced a plot due to (an anonymous colleague of) Jonathan Owen, showing that texts from the last half of the 20th century saw a decrease in the relative frequency of NOUN which VERB, and an increase in the relative frequency of NOUN that VERB. Jonathan took this to indicate the success of (usage guides like) Strunk & White's The Elements of Style in persuading writers and copy-editors to avoid which in "restrictive" (AKA "defining" or "integrated") relative clauses

Here are some plots showing the effect, for data (without smoothing) from the Google Books ngram corpus. The "British English" dataset shows about the same increase in NOUN that as the "American English" collection does, but somewhat less decrease in NOUN which:

American English British English

Note that I have NOT simply plotted the frequency of the forms, but rather the proportional change over the course of the century, relative to the mean during that period. Thus if WHICH is the vector of frequency values for the pattern NOUN which from 1900 to 2000, the red curves represent \(WHICH/mean(WHICH)\).

I should also note that both patterns will cover some things that are not relative clauses at all — complement-clauses with that (e.g. "the way that", "the idea that", etc.), and question-word uses of which (e.g. "ask John which he prefers", "asked in a quiet voice which road to take"). But it's striking that Strunk & White's publication date of 1959 corresponds so neatly to an apparent inflection point in the plots.

However, if we look at the trends for more specific patterns, the simple which-hunting story is not so clear. At least, it interacts with other trends that may obscure or overwhelm it in particular cases. For example, we find some evidence for an overall decline in the frequency of (some types of) relative clauses, at least from 1900 to 1970 or 1980.

The plots below show the proportional changes in four sets (also merging upper and lower case):

the thing that
the things that
blue, solid line
the thing which
the things which
 blue, dashed line
the man that
the men that
the woman that
the women that
 red, solid line
the man who
the men who
the woman who
the women who
 red, dashed line


Google Ngrams "English" COHA

And if we look across a range of subject pronouns in a similar set of relative clause structures, we see some other effects as well. For some of the pronouns (I, you, we, she), the decline in relative-clause frequencies sharply reverses about 1965, while others (they, he) level off similarly to the overall pattern shown above. At least, that's what happens for relative clauses with that and those that start with the bare pronoun:

"the things PRO" "the things that PRO

The same structures with which just keep on declining:

This all seems to mean that at least the following things are going on:

  1. An overall 20th-century decline in relative-clause frequency, probably correlated with declining sentence length and complexity (see e.g. "Real trends in word and sentence length", 10/31/2011; "Inaugural embedding", 9/9/2005; "The evolution of disornamentation", 2/21/2005.
  2. A change since the 1960s (in overall writing style, or in the Google Books ngram corpus sample, or both) towards increased use of first- and second-person pronouns.
  3. A change since the 1960s (in overall writing style, or in the Google Books ngram corpus sample, or both) towards increased discussion of women.
  4. Which-hunting — which may have started before 1959, due to the influence of the Fowlers, but seems to have been strongly boosted by The Elements of Style.

I'm sure that further investigation would uncover additional complexities.

Note: If you still think that E.B. White's conversion to the which-hunting faith was a recognition of the Truth, see Geoff Pullum's essay "A Rule Which Will Live In infamy", Lingua Franca 12/7/2012.


  1. _NL said,

    December 18, 2015 @ 10:58 am

    I had a boss who regularly struck out "which" and substituted in "that" in documents I wrote for his review. He didn't do it 100% of the time and his explanation of the rule never really made sense (unless the rule is that the word "which" should be mostly avoided), but I didn't really push back on it. He was a really smart guy but very literal and rule-following, so I think he just knew that somehow it was a rule.

    I didn't even point out to him all the times he corrected my non-restrictive clause uses of "which" into restrictive clause uses of "that." Pretty sure he was smarter than me, and I'm certain he was better at my job, so I avoided being smug or condescending in deference to all the times he gave me non-judgmental advice. But man, this correction was really annoying.

    Had another superior at that job who was really big on split infinitives and I think passive voice. Certain professions attract people who are literal, process-oriented, and rule-following. Sometimes those people prefer hard prohibitions to having multiple options, even when the rule doesn't have much independent justification. "Well, it's the rule."

  2. mollymooly said,

    December 18, 2015 @ 12:26 pm

    I choose to ascribe part of the British acceleration to the influence of American word processor's style checkers. I seem to recall a 1990s newspaper review of such checkers where the reviewer had learned of the existence of the that-which "rule" from the software.

  3. M.N. said,

    December 18, 2015 @ 12:43 pm

    I'm one of those weird millennials, so this zombie stuff may have had some effect on my language acquisition. Some examples with 'which' have a "yeah, it's English, but not my dialect" sort of feeling to my ear, similar to the feeling I get with a sentence that has 'whom' where I would more naturally use 'who', or other things of that sort.

    In reading the examples here and in the linked Lingua Franca post, it seems that an example with 'which' sounds a little better to me in an indefinite than a definite environment, but I couldn't say why. Also, if there's another modifier (like a PP or reduced relative) before the relative clause in question, that seems to improve it as well:

    'a rule about usage which many people dispute' vs. '??a rule which many people dispute'

    In such cases, there's a potential syntactic ambiguity: [a [ [rule [about usage]] [which many people dispute]]] vs. [a [rule [about [usage which many people dispute]]]]. (When I was coming up with the example, I intended the first parse.) Whether this has anything to do with the reason for the slight contrast in perceived naturalness, I also don't know.

  4. Jonathon Owen said,

    December 18, 2015 @ 1:12 pm

    Interesting post. It's times like these I really wish there were a large and easily searchable tagged corpus. If you could search for all relative pronouns rather than trying to construct a good search string that will catch a fair number of relative pronouns without catching too many things like complementizers, it would be a lot easier to see the trends.

    PS: My name is Jonathon Owen, not Jonathan Owens.

  5. Jerry Friedman said,

    December 18, 2015 @ 1:20 pm

    Jonathon Owen: Did you know the Google Books American, British, and Spanish ngram corpora are now tagged and searchable with the BYU interface? They're here.

  6. Ralph Hickok said,

    December 18, 2015 @ 1:25 pm

    "Certain professions attract people who are literal, process-oriented, and rule-following."
    I used to be copy chief for an ad agency that handled several high-tech accounts, and I often dealt with engineers in executive positions. When reviewing my copy, they were prone to ask questions like, "Why did you put a comma here?" It was never a criticism, but a genuine search for the rule behind my choice.

  7. Jerry Friedman said,

    December 18, 2015 @ 1:26 pm

    Sorry, that might have been obvious from Prof. Liberman's graphs. And I'm not clear at all on how to search for relative pronouns with the BYU interface, if there's a way.

  8. Jerry Friedman said,

    December 18, 2015 @ 1:30 pm

    Or actually, I guess you can search for "the NOUN that" at GB too. Is that new? I was just looking at the "about" page and saw various things with parts of speech that I don't remember from the last time I looked.

  9. Jonathon Owen said,

    December 18, 2015 @ 2:57 pm

    @Jerry Friedman:

    Sorry, I think I meant to say a parsed corpus, not a tagged one. I know Google Books is tagged (both through the original interface and BYU's), but it doesn't have the level of syntactic detail that would let you search specifically for relative clauses.

  10. Ben Zimmer said,

    December 18, 2015 @ 3:46 pm

    Coincidentally, the December 2015 issue of Language includes the article, "Which-hunting and the Standard English Relative Clause" by Lars Hinrichs, Benedikt Szmrecsanyi, and Axel Bohmann. Abstract:

    Alternation among restrictive relativizers in written Standard English is undergoing a massive shift from which to that. In corpora of written-edited-published British and American English covering the period from 1961–1992, American English spearheads this change. We study 16,868 restrictive relative clauses with inanimate antecedents from the Brown quartet of corpora. Predictors include additional areas of variation regulated by prescriptivism. We show that: (i) relativizer deletion follows different constraints from the selection of either that or which; (ii) this change is a case of institutionally backed colloquialization-cum-Americanization; and (iii) uptake of the precept correlates with avoidance of the passive voice at the text level but not with other prescriptive rules.

    Pre-print here, press release here.

  11. Pflaumbaum said,

    December 18, 2015 @ 5:15 pm

    I'd be interested to know how gerund-participles were faring against the all-devouring that.

    I write mostly for Americans, and they are forever nixing my -ings.

  12. Bill Benzon said,

    December 19, 2015 @ 8:24 am

    What about the coercive efforts of MSWord's grammar checker?

RSS feed for comments on this post