A quantitative history of which-hunting

« previous post | next post »

In a comment yesterday, Jonathon Owen pointed us to a fascinating post at Arrant Pedantry on Which Hunting (12/23/2011). You should read the whole thing, but as a teaser, here's the key graph:

And here's (part of) what Jonathon says about it:

For one of our projects in the corpus class, we were instructed to choose a prescriptive rule and then examine it using corpus data, determining whether the rule was followed in actual usage and whether it varied over time, among genres, or between the American and British dialects. One of my classmates (and former coworkers) chose the that/which rule for her project, and I found the results enlightening.

She searched for the sequences “[noun] that [verb]” and “[noun] which [verb],” which aren’t perfect—they obviously won’t find every relative clause, and they’ll pull in a few non-relatives—but the results serve as a rough measurement of their relative frequencies. What she found is that before about the 1920s, the two were used with nearly equal frequency. That is, the distinction did not exist. After that, though, which takes a dive and that surges.

Something that Jonathon doesn't say: It looks plausible to me that there's an acceleration of the trend around the time of the first publication of E.B. White's Elements of Style in 1959.


  1. amanda chen said,

    September 5, 2012 @ 8:49 am

    Has anyone ever calculated the frequencies for spoken English?

  2. Joe said,

    September 5, 2012 @ 9:16 am

    Jonathon's got a post on relative what and it's almost non-existent usage. Pretty fascinating post as well.

  3. Pflaumbaum said,

    September 5, 2012 @ 9:55 am

    The linked article raises an issue I've been wanting to ask about for a while:

    Is there consensus among linguists about the status of prescriptive rules which are influential to the point of being generally observed in relatively formal speech and writing? If the that/which 'rule' has had a dramatic effect on usage – but still has to be consciously learnt – at what point is 'violating' it regarded as non-standard?

    Take co-ordinated accusative pronouns in subject position. CGEL categorises them as 'unquestionably non-standard', noting that they are 'strongly stigmatised'. Yet there's evidence that the accusative is the default form, used as a disjunctive pronoun; that children include co-ordination in its domain until it's drummed out of them by prescriptive pressure; and that it resurfaces in the unguarded speech of Standard English speakers. If co-ordinated subject accusatives are really non-standard, does that mean that, in light of the evidence in this article, sentences like my second one are also non-standard (or well on the way to being)?

    Or are there linguists who contend that a syntactic rule ought to be unconscious and apply in informal speech – not have to be enforced by parents, teachers, copy-editors etc. – before it's regarded as operational in the language?

  4. RP said,

    September 5, 2012 @ 10:12 am

    In the UK, relative "what" is certainly not "almost non-existent" or even remotely close – although it is considered non-standard.

  5. Coby Lubliner said,

    September 5, 2012 @ 10:44 am

    I learned the American that/which rule in my freshman English class. Because at the time I had been living in the US for only three years, I internalized the rule to such an extent that, now that I know it to be a zombie rule, I have to make a conscious effort to occasionally use a restrictive "which" (and, while I'm at it, to occasionally split an infinitive). But that was in 1953-54, several years before the publication of S&W. So, the "rule" must already have been in the air (and maybe in the text we used — I don't remember what it was) before Elwyn B.W. canonized it.

  6. Steve F said,

    September 5, 2012 @ 11:22 am

    As a Br E speaker, I'd always considered that/which to be a stylistic choice in restrictive clauses, and I was perplexed when I started to use a computer and found that Microsoft Word started putting wavy lines under it (even if it was set for Br E.) Until then, not having read Strunk and White at that point (it doesn't have much currency on this side of the pond), I was unaware that anyone objected to it, and as an EFL teacher I taught what all the text books said (and still say) that in restrictive relative clauses both 'that' and 'which' are acceptable (though of course when it is an object pronoun it can be missed out altogether – irrelevant side-note: why can grammatically unsophisticated people identify an object pronoun when omitting the pronoun, but get so confused by 'whom'?)

    So British usage would seem to confirm Mark's conjecture that Strunk and White are responsible for the prevalence of the 'rule' in US usage (even if Coby Lubliner's memory of the chronology suggests it predates the publication of White's revision, and it cannot originate there.)

    As for 'what' as a relative pronoun, I can certainly confirm RP's impression that it is very common, though non-standard, in Br E. One of its most memorable uses occurs in a catchphrase associated with the much loved comedy act Morecambe and Wise, in the phrase 'The play what I wrote'.

  7. J.W. Brewer said,

    September 5, 2012 @ 11:45 am

    @Plaumbaum – it seems plausible (and more obvious perhaps for some non-English languages) that spoken and written varieties of a language could diverge enough to have, in a descriptivist sense, different grammars. So a native speaker/writer might "naturally" follow a particular pattern in writing that he would not follow in speech. But I think the question would still be whether the rule is sufficiently internalized to become subconscious/automatic. If a high enough percentage of people who believe in the validity of the rule keep systematically violating it in producing rough drafts and keep needing to clean it up in the proofreading/editing stage, that would seem to indicate something about how "real" a rule it is. But once a rule becomes internalized than whatever one might think about the "politics" of its origin (stupid prescriptivist schoolteachers, hypercorrection by social climbers, navigating around a taboo based on false etymology, invasion and conquest by Vikings speaking some weird dialect of French, whatever . . .) becomes irrelevant.

    The second part of Plaumbaum's question I'm not sure I understand because I'm not sure there exist speech communities (even w/o literacy or w/o formal schools) in which young children's "natural" errors in speech are not corrected by their elders with enough frequency that the children generally internalize and operationalize the adult rules. That doesn't mean that the patterns of those "natural" errors aren't interesting, but they also vary. One of my daughters had the default-accusative-pronoun thing quite strongly as a toddler; the other didn't, and I don't think it's because the other one was corrected by adults more aggressively.

  8. J.W. Brewer said,

    September 5, 2012 @ 11:55 am

    Wait, I think I may have misunderstood the point about subject-position accusative pronouns. The rule against coordinated ones ("her and my brother went down to the store") is obviously on a different metaphysical/psychological plane than the one against bare ones ("her went down to the store"), the latter of which the vast majority of adult speakers observe even in informal speech. It's the latter rule one of my daughters systematically violated as a toddler (which I understand is not uncommon, and is an interesting phenomenon), until she grew out of it. Whether she would have grown out of it simply by passive observation/replication of the speech of her elders with no one ever "correcting" her I don't know, because I don't think that's really ever what happens, and I was probably a pretty hands-off parent as these things go partly because I found the patterning of errors so interesting to observe and was pretty sure that one way or another she'd learn to speak the standard variety w/o overly aggressive intervention by me in particular.

  9. Pflaumbaum said,

    September 5, 2012 @ 12:01 pm

    @ J.W. Brewer

    Sorry, I didn't mean toddlers, I'm talking about young people who otherwise speak fully fluent Standard English. I remember being corrected about accusatives well into my late teens.

    I'm also talking about adults. As described in this paper (p.62), a corpus of phone conversations shows [him/her and X] VERB types at a relative frequency of .353, compared to. 571 for [s/he and X] VERB.

    Anyway, I don't mean to derail the comments into a co-ordinated pronouns discussion. The which/that example would do just as well in theory, if the present trend continued.

  10. Pflaumbaum said,

    September 5, 2012 @ 12:21 pm

    @ Steve F

    Re your side-note: if I understand you correctly, doesn't this again relate to the distinction between unconscious syntactic rules and prescriptive 'rules'?

    Many people who routinely use bare relative clauses might struggle if you told them that from now on they shouldn't. I suspect they'd miss many, and perhaps start over-generalising your 'rule' and, say, inserting that into content clauses where they'd ordinarily omit it; or even inserting it where it's currently ungrammatical.

  11. Steve F said,

    September 5, 2012 @ 2:09 pm

    @ Plaufbaum

    Yes – I'm sure you're right, and I was probably aware as I wrote it that I knew the answer to my own question: People learn to use bare relative clauses as a collocation, not a rule, whereas 'whom' is too rare to be learnt that way. And of course 'whom' is formal usage as well as an accusative, and many people seem to use it to indicate the former rather than the latter.

    I should also add to my previous comment that, having now had a closer look at the graph, I realise there is much less difference between British and US usage than I thought. I've also just checked the Guardian's Style Guide as a sample of current practice – here:

    which essentially agrees with Strunk and White (though is a bit hazy about when 'that' can be omitted). So British usage probably has no bearing on whether E B White is to blame.

  12. J.W. Brewer said,

    September 5, 2012 @ 4:07 pm

    Perhaps I am wildly overgeneralizing from my own idiolect, but I think I natively speak (and this has to do with parents / childhood environment etc., so it's not a boast or an achievement or anything) about as "high" or "pure" a version of the prestige/educated/standard sort of AmEng (at least as to syntax/lexicon – maybe a little bit of regional phonological flavor in the vowels and a handful of variant/rustic pronunciations like "acrosst" for "across") as there is outside a museum these days. Even in an informal register I don't say "ain't" other than as a self-conscious affectation and I do say "whom" (not always, but often even in reasonably informal speech, and not with the sort of odd placement characteristic of hypercorrection. I'm pretty sure there are other shibboleths distinguishing between prestige and non-prestige AmEng where I am natively on the prestige side of the line. But I don't at all consistently follow the prescriptivist that/which distinction either in speech or when I write, I get affirmatively irked when people mark up my drafts to try to impose it (I do or don't accept the edits depending on the politics of the situation . . .). I perhaps arrogantly conclude from this that any rule that does not accurately describe my idiolect is probably not in fact in descriptivist terms a real rule of AmEng, even the standard/prestige/educated written variety. I can imagine that there are many people who believe in the rule to the extent that if they are paying attention they upon proofreading will edit their first drafts to conform to the rule rather than wait for a teacher or copyeditor to do it for them. I am curious as to how many other people are out there (I'm not doubting Cory Lubliner — my own idiolect has a few features that were the result of internalizing what I now know to be bogus prescriptivist advice from a well-intentioned 10th grade teacher) who have internalized the rule to the extent of naturally producing first drafts that consistently conform to it. This of course doesn't mean that the that/which rule could not become better established in the future such that it was natively part of my (hypthetical future) grandchildren's idiolects.

  13. Doreen said,

    September 5, 2012 @ 5:22 pm

    @Joe, @RP
    And of course there's the iconic headline, IT'S THE SUN WOT WON IT (see e.g. the Wikipedia entry by that title).

  14. Jonathon said,

    September 5, 2012 @ 5:25 pm

    Pflaumbaum: I've wondered the same thing myself (referring to your first comment). It seems that prescriptivists are increasingly pointing to corpus data to justify certain rules, which creates a feedback loop.

    But it's something of an illusion, because published writing is edited writing. How many of those restrictive thats did the authors write themselves, and how many are the result of a copy editor changing all the restrictive whiches to thats? My own experience as a copy says that this is one of editors' favorite rules, and there are an awful lot of opportunities to apply it.

    In my opinion, as long as people are using it in significant numbers, whether or not editors change it before it appears on the printed page, then it's difficult to call it nonstandard. You might be able to say that restrictive which is disfavored in formal, edited writing, but that's about it. Of course, that passive raises the important questions, disfavored by whom, and why?

  15. Thom said,

    September 5, 2012 @ 11:16 pm

    I conducted similar research for a class last year. While looking at pied piping vs. preposition stranding, it seems that "whom" is mostly preserved in the cases of pied piping. The same is likely for which. However, it seems that all of the interrogative complements are being replaced with "that". It is a common trend in clause structures.

  16. Adam said,

    September 6, 2012 @ 3:56 am

    With regard to the differences between British & American usage, especially in Steve F's comment, I understand (I'm open to correction on this) that the Fowler brothers made up the that/which rule and Strunk & White got it from them. Apparently it just didn't take in its homeland.

  17. J.W. Brewer said,

    September 6, 2012 @ 9:44 am

    For Jonathon's point, it is presumably now possible for someone to put together a fairly large/balanced/representative corpus of comparatively unedited writing drawn from blogposts, comments, and the like. (Just as there are corpora out there with various sorts of transcribed speech.) A problem might be that that sort of online unedited writing might often intentionally have been produced in a more informal register than writing-for-print-publication would be. What one would ideally want is a corpus built from unedited first drafts of material intended for old-fashioned print publication, where the author is taking his best shot at a formal written style/register but hasn't yet done his own proofreading and/or had a copyeditor intervene. But I'm not sure how feasible it would be at present to get your hands on enough such material to construct a large/balanced/representative corpus.

  18. Jonathon said,

    September 6, 2012 @ 1:21 pm

    J.W. Brewer: I'm doing something along those lines for my master's thesis. It's not a large corpus, and it's by no means balanced or representative, but I've got 600 pages of academic manuscripts with editorial changes tracked so that I can examine the grammar and usage changes imposed by the editors.

    What I've found so far backs up the idea that authors are still using restrictive which quite a lot and that editors are changing it. It's by far the most frequent usage change made, and it's made in about 75 percent of applicable cases.

    If someone could put together the kind of corpus you're describing, it'd provide a fascinating look into the workings of language standardization. But as you said, getting the material would be a significant challenge.

  19. J.W. Brewer said,

    September 6, 2012 @ 2:29 pm

    Jonathon: good for you! And I take it your raw material is already at a second stage, i.e. (although obviously it varies by author and the context of the particular relationship with the editor) assuming the author doesn't generally just hit send on a rough first draft without doing at least one round of proofreading and revision before giving the MS to the editor, these are "violations" that the authors themselves didn't catch and correct? I suppose another test you could do on the material is see if the issues flagged by the editors would or would not have been flagged by the obviously-still-clunky grammar/syntax equivalent of spellcheckers that many word processing software packages now have built into them. Of course, if they would have been flagged you still don't necessarily know if the author didn't bother to run the program (although I think the default settings on the software I use will run the grammar check along with the spell check unless you've specified otherwise) or ran the program but elected not to accept suggestions that seemed dumb.

  20. Jerry Friedman said,

    September 6, 2012 @ 3:18 pm

    @Adam: The earliest version of a that/which rule I know of is from A Higher English Grammar, by the Scottish philosopher and psychologist Alexander Bain. As you can see here, his proposal was more radical than the Fowlers'.

  21. David Morris said,

    September 6, 2012 @ 6:14 pm

    In my own writing, I *prefer* restrictive "which", on the grounds that "that" already has enough else to do as a demonstrative or complementiser. In practice, though, whenever I write a sentence with a restrictive "which", it never looks quite right, and I get an uneasy feeling that some reader – known, unknown, imaginary – is going to judge me unkindly because of it.

    Even now, despite the pronouncements by some people, "which" can be, and is, used restrictively. On the other hand, "that" is never used non-restrictively. As an ESL teacher, I would have to say "Why teach two rules when you can teach one?".

  22. Jonathon said,

    September 9, 2012 @ 9:41 pm

    J.W. Brewer: They were manuscripts that were being submitted for publication, so they weren't just rough drafts that the authors had dashed off. Of course, it's impossible to say how much time the authors spent on their articles without asking them directly, but they seemed like polished manuscripts to me. And judging from my own experience as a copy editor, I'd say that that the editing they received looked pretty comparable to what I usually see at work.

RSS feed for comments on this post