More on "PRONOUN + VBG" constructions

My post on "Possessive with gerund: Tragic loss or good riddance?" (9/18/2010) has gotten me deeper than is probably wise into a field where I know very little. But having splashed on in, I might as well keep wading forwards a little further.  In particular, a bit of poking around on Google Scholar turned up some relevant recent work, especially Liesbet Heyvaert et al., "Pronominal Determiners in Gerundive Nominalization: A 'Case' Study", English Studies 86(1): 71-88, 2005.

Here's what Heyvaert et al. did:

To determine how far the use of personal pronouns in gerundive nominals is register-bound, we examined corpus data from two subcorpora of the Collins COBUILD corpus. We extracted material from the Times corpus, which is a corpus of 5,763,761 words based on the newspaper The Times. The written, highly formal register which it represents allowed us to check on the claim that possessive pronouns are preferred in formal language. To see whether oblique pronouns predominate in informal language, then, we extracted data from the UKspoken corpus, a collection of 9,272,579 words based on recordings of informal speech. From both corpora, all instances of the pattern PRONOUN + VBG (i.e. verb in -ing form) were extracted and classified. The personal pronouns we searched for were those with a distinct form for the genitive and the oblique, i.e. I, you, he, they, we and it (the pronoun she has her for the genitive as well as for the oblique and has therefore not been taken into account): the genitive forms of these pronouns are my, your, his, their, our and its; their oblique counterparts are me, you, him, them, us and it.

These are smaller samples than I used — 6 million words of Times text compared to 400 million in COCA, and 9 million words of UKspoken transcripts compared to 25 million in the LDC CTS collection.

On the other hand, their search was much broader and more accurate. They looked at "all instances of the pattern PRONOUN + VBG", rather than just the pattern PRONOUN + having. And they classified each hit by hand, rather than relying on simple hit counts as a proxy measure. (This is also why they were able to include it/its and you/your among their pronouns — I omitted these due to a concern about the relatively large number of cases like "what effect was it having" and "are you having a nice time?" in my counts.)

Here are their results in tabular form:

In other words, 767:64 = 7.7% genitives in the UKspoken corpus, and 144:75 34% genitives in the Times corpus. This compares to 93:7 = 7% genitives for the Fisher English data, and 170:22 = 42% genitives in the last 5 years of the COCA data for "PRONOUN + having".

If we search COCA for {me|him|us|them having} and break the results out by genre, we get

And for {my|his|our|their having}:

This yields

Genre Percent Genitive
Spoken 27%
Fiction 55%
Magazine 67%
Newspaper 43%
Academic 87%

If we do the same search in the British National Corpus, we get

Genre Percent Genitive
Spoken 13%
Fiction 43%
Magazine 17%
Newspaper 28%
Non-acad 60%
Academic 67%
Misc. 58%

(Note again that my numbers are based on on proxy measures — hit counts that have not been individually checked for validity — and are thus likely to be somewhat noisy.)

Details aside, this presents a reasonably consistent picture.  On both sides of the Atlantic, the spoken language has a relatively low rate of genitive pronouns in the PRONOUN + VBG construction, while the written language varies (by time and genre) from nearly all oblique (or "accusative") pronouns, through various mixtures of genitive and oblique, down to values in the same range as some spoken collections.

If we believe the historical evidence from COHA, this raises a number of interesting questions, which I'll try to express in terms of alternative hypothetical histories.

Possible History (1): Spoken English has strongly favored the oblique case for hundreds of years, since before the historical split between British and American varieties. The opposite (genitive) preference in written English has been an artificially-maintained stylistic difference for an equally long time. This stylistic split began to erode around 1950, on both sides of the Atlantic, and written English is now moving fairly rapidly towards spoken-English norms with respect to this particular issue (with some genres of writing of course ahead of others). The process will be complete, given present trends, within a few decades.

Possible History (2): Both spoken and written English have been shifting away from the genitive (in these PRONOUN + VBG constructions) for the past century of so. However, spoken English shifted first and/or more rapidly, and written English is still catching up.

There are plenty of other possible histories consistent with the evidence so far, but each of them is interestingly problematic. (There are people out there who probably know the truth, or more it than I do — for example, I still have to check out David Denison's chapter in vol. 4 of the Cambridge History of English. But let's go forward for the moment as if all the options were still on the table.)

If some version of (1) is true, for example, then it suggests a rather non-standard picture of how language change is reflected in written records. On this view, it's not that the English language gradually changed, and documents at different dates track its progress. Rather, two fully-formed styles or registers existed from the beginning (or at least from the 18th century or so), and the historical record shows us one of them gradually infiltrating the other.

This is not a novel idea — thus Tony Kroch, "Syntactic Change", in Mark Baltin & Chris Collins, Eds. Handbook of Syntax, 1999:

We are limited to the written language, often of societies with a low rate of literacy and sharp class distinctions in language. In these circumstances, it could easily be the case that the forms in competition in syntactic diglossia represent an opposition between an innovative vernacular and a conservative literary language. Since the former would have both a psycholinguistic advantage and the advantage of numbers, it should win out over time, even in written texts. Under this model, the gradualism found in texts might not reflect any basic mechanism of language change, but rather the psycho- and sociolinguistics of bilingualism. The actual (sudden) change in parameter setting would have occurred unobserved in the vernacular and only its competition with conservative educated usage would be accessible to study in the texts.

However, the change under discussion here has been taking place in societies with a relatively high rate of literacy; and it's not clear that the variable in question was ever strongly class-bound or stigmatized. Furthermore, though I don't have any evidence to offer on this point, I would be surprised to learn that the change in the vernacular (whenever it occurred) was more rapid than the change in the written register has been.

Histories like (2) pose their own puzzles, for example requiring us to explain how a change of this kind could occur in parallel in two speech communities that are as generally disjoint as the UK and the United States are.  Histories of type (1) require a similar parallel change in writing, of course, but there's considerably more mutual interaction in that domain.

Finally, how good is the evidence for a historical change in this variable?  More needs to be done on this, but I was somewhat reassured by this note from Mark Davies about the genre composition in COHA:

It is true that it changes more than I'd like in the 1800s (since no newspapers from 1810s-1850s), but honestly, in the 1900s it's very, very similar from decade to decade. There aren't any (spoken) transcripts in COHA, so no worry about the introduction of spoken texts affecting things for the 1990s-2000s. The balance between the macro-genres stays almost exactly the same throughout the 1900s (see the "composition of the corpus" page at the website), and even at the micro-genre level it's very, very similar (e.g. number of words in science fiction, movie scripts, plays, distribution of NF books (by LOC call number), etc etc) (see the downloadable Excel file at the website).

So this means that the trend I found from 1950 to 2010 is probably not due to genre shift. Someone should extend my counts beyond "PRONOUN + having", with proper checking of the hits by human judges or good automatic classifiers.  But at this point I'll be surprised if the basic result changes a lot.



  1. Pflaumbaum said,

    September 20, 2010 @ 7:31 am

    Doesn't the register of the possessive version depend on the verb?

    With the verb 'be' – 'your being', 'his being' etc. – it feels no less colloquial than 'you/him being'. I reckon my Yorkshire grandma, for instance, would drop her aitches in a statement like, "what with 'is being from 'ull [Hull]", where she probably wouldn't if she was making an effort to speak in a higher register.

    Whereas a sentence like, "It depends on his arriving on time" sounds formal to the point of being stilted.

    [(myl) You're probably right. And this is just the sort of thing that you can investigate in depth using the search capabilities at the BYU sites for COCA and for the BNC!]

  2. iching said,

    September 20, 2010 @ 8:07 am

    So "it depends on his arriving on time" sounds different to "it depends on his being on time" to you? I would be hard pressed to detect any difference myself. Or does the difference only extend to other usages of being such as the one you gave? Can you think of any more examples of different verbs exhibiting different registers with the poss.+gerund construction?

  3. Henning Makholm said,

    September 20, 2010 @ 8:37 am

    Whereas a sentence like, "It depends on his arriving on time" sounds formal to the point of being stilted.

    I think that is because the possessive option does not go well with hypotheticals. It implicitly assumes that he has an "arriving on time" that we can speak about.

  4. Pflaumbaum said,

    September 20, 2010 @ 8:54 am


    You're right, I should have used a single sentence as an example and substituted the different verbs. The colloquial feel of my grandma's sentence might be attributable to the 'what with…' construction, and though 'his being' fits well enough with that, it's true that it sounds more formal to me in 'it depends on his being on time' (though still somewhat more natural than 'it depends on his arriving on time' does).

  5. Alexander said,

    September 20, 2010 @ 10:29 am

    Ideally, one would want to know the syntactic contexts of the nominals in these corpus studies. The observed effects of register could be, in part, on the frequency of contexts which favor, syntactically or semantically, one or the other type of nominal. The place to start, if anywhere, might be subject vs direct object vs object of preposition. Patterns of verb choice might also matter.

    [(myl) Not only ideally, but also really, this information is available to you, in whatever detail you like, for the COCA, COHA, BNC, and Switchboard searches. You'll have to put in a bit time to classify and record whichever aspects of the syntactic context may interest you, but the numbers of hits are not enormous. If you can classify and record a context in (say) 10 seconds, you should be able to handle 360 hits per hour. My Breakfast Experiment on COCA generated a total of 1166 hits, which would thus take you about 2 hours fifteen minutes. Or you could limit yourself to the last five years of data, which would give 292 hits, or about 50 minutes of work, which would fit within your breakfast (or whatever) hour.]

  6. John Lawler said,

    September 20, 2010 @ 10:37 am

    That's a pretty impressive match with the results of the Breakfast Experiment™, Mark. Congratulations.

  7. Spell Me Jeff said,

    September 20, 2010 @ 12:28 pm

    It is interesting that the rate for fiction is so high. I wonder how much of this is authorial, and how much a result of editing/proofing. Depending on the publisher, proofreaders can be pretty brutal and very conservative. (Personal experience.)

  8. Karen said,

    September 20, 2010 @ 12:54 pm

    I'm wondering why this is all about pronouns, now… We see the same difference with nouns: I know pronouns behave differently from nouns in some constructions; is there a difference here? "It all depends on John('s) getting that signature".

    [(myl) It's obvious that these constructions exist with full noun phrases as subjects, as well as with pronouns: the two earlier posts in this series gave many such examples. As for why one might look at pronouns separately, in my case it was because it's possible to search for a determinate list of them. But also, previous descriptions often suggest that pronouns behave somewhat differently from nouns -- which may be part of why Heyvaert et al. focused on examples with pronouns.

    The COCA, COHA & BNC search engines will let you look for POS sequences, and the door is open for you to replicate this work with nouns or other noun phrase instances. The trouble you'll run into is that e.g. "John's having" will often be part of a sequence like "John's having trouble with his new puppy", where 's is not the possessive marker but rather a contracted form of 'is'. So I think you're going to have to look at each hit and classify it, which I aimed to avoid.]

  9. Army1987 said,

    September 20, 2010 @ 2:44 pm

    Histories like (2) pose their own puzzles, for example requiring us to explain how a change of this kind could occur in parallel in two speech communities that are as generally disjoint as the UK and the United States are.
    Well, there are quite a few changes in speech which are going in the same direction in both BrE and AmE (though in some of them one of the dialects is ahead of the other): CURE-FORCE merger, GOOSE fronting, HAPPY tensing…

  10. Alexander said,

    September 20, 2010 @ 11:19 pm

    In the earlier "Possessive with Gerund" discussion, Michael Farris said that the gerunds with objective form 'subjects' seemed to be about "potential situations". I shared a bit of this feeling.

    In response I followed Mark's advice, and briefly consulted the COCA. I searched for: "my|your|his|our|their leaving me|you|him|us|them" and then "me|you|him|us|them| [DITTO]" I left out "her" because of the syncretism; used a transitive gerund to filter out "-ing/of" event nominals; and used a transitive gerund, "leaving", that actually gets some hits.

    The small handful of results that there were seem consonant with the intuition. The cases with possessives are generally about situations considered certain or factual. Two examples: "She tried to blot out the memory of his leaving them," "in the years that followed his leaving them." The cases with objective pronouns were generally about possible and uncertain situations. Two examples: "They hate the prospect of him leaving them," "And there was no question of him leaving them."

    Here's an example from the corpus that show how the difference might affect interpretation: "I might have resented him leaving you all his money once." To me this suggests that he did not leave you all the money at once, so that the author claims no virtuous equanimity. Cut hange the "him" to "his", and I think it does suggest that, so that the author is now claiming some virtue.

    If this is even representative data, it nevertheless seems clear that the semantic pattern is only a tendency, and that each form can be used either way. It would be interesting, however, if the tendency (if real) contributes to differences in the distribution of the alternates depending on grammatical function, reported in the Heyvaert et al paper that Mark discussed.

  11. John Walden said,

    September 21, 2010 @ 3:03 am

    I'm no linguist so bear with me. "I watched him arriving" is an abbreviated form of "I watched him as he was arriving" and really boils down to "I watched him". So, "We are depending on him arriving on time" contains a strong suggestion of "We are depending on him" while "We are depending on his arriving on time" leaves 'him' and our opinions about him out of the game. It means "We are depending on the timely arrival of a person, namely him".

    When there's an object/oblique/disjunctive pronoun I believe it could have been arrived at down a route of a contracted progressive/continuous form and we may mentally bracket what comes after:

    "You ((when you are) smoking) annoys me" implies "You annoy me"

    "Your smoking annoys me" suggests "though it's nothing personal"

    "She tried to blot out the memory of him ((when he was) leaving them)"


    "She tried to blot out the memory of his leaving them"

  12. David Fried said,

    September 21, 2010 @ 10:18 pm

    I'm thoroughly confused about a couple of points. Although American and British may be evolving in parallel toward the objective case and away from the possessive, who's ahead? Now? Historically?? In writing? In speech?

    Why I ask: I first learned of this supposed problem or error from Fowler, when I was a teenager, whence I got the idea that this was a British shibboleth. But a lifetime of reading has left me with the strong impression that British authors are far more likely to use the objective case, and American the possessive. Does anyone else share this impression?

    I was also fascinated to learn that writers earlier than Fowler complained that the construction with the possessive cannot be parsed. Fowler's point, as I recall, is that of course a verbal noun should be governed by the possessive; it's the other construction that won't submit to analysis. Of course, I well remember Fowler's "explanation" of the choice between "shall" and "will," which amounted to: Educated Englishmen from the Home Counties (like me) get this right effortlessly; no one else can; and I certainly can't explain it to you!

