## Will vs. going to: a recount

Yesterday, I took a quick poll of a few small English-language texts, to see how often future-time meanings were expressed in various tensed-verb forms ("Alternative futures", 12/11/2008). My conclusion was that by far the commonest method in written American English is to use forms of the modal auxiliary will; but that in spoken American English, other alternatives are closer to even with it. However, my sample was too small to draw any very reliable quantitative conclusions.

So this morning, I'm doing another Breakfast Experiment™ to try to get better numbers, at least for some of the alternatives in the spoken language.

The data comes from the transcripts of two published conversational speech corpora, "Fisher English Training Speech Part 1 Transcripts" and "Fisher English Training Speech Part 2 Transcripts", together comprising 11,699 conversations of up to ten minutes each, recorded in 2003. The speakers come from a broad sample of regions, ages, and socioeconomic levels — full demographic details are available in the associated publications.

Since I don't have time during breakfast to read all these conversations and look for future-time references in tensed verbs, I'm going to come at the question from a different angle.

First, I'll scan the transcripts for instances of the strings "will", "won't", "'ll", "going to", and "gonna", taking appropriate precautions to avoid cases where these strings form part of other words (e.g. "willing" or "Williamsburg").

Second, I'll check a sample of the hits in order to estimate what percentage are genuine modals or semi-modals, as opposed to things like "I believe in absolute free will" or "I hate going to the doctor". (In the case of "won't", I'll also exclude things like "they won't let you smoke anywhere" and "I won't watch that stuff".)

And finally, I'll use those estimated proportions to get corrected counts.

The raw results:

 will 'll won't going to gonna 15,219 32,515 4,975 29,331 22,176

This yields 15219+32515+4975 = 52,709 for the various forms of will, versus 29331+22176 = 51,507 for the various forms of going to.

I checked a random sample of 100 instances of will, and found that 3 were things like "then it's their own free will to read it".

Checking a random sample of 100 instances of going to, I found that 13 were things like

all the money's going to transportation and food
instead of going to the drive through
i'm going to a ear nose and throat specialist
i'd be interested in going to see it

And checking a random sample of 100 instances of won't, I found that 24 were things like

they won't really let you stop even for a minute
my husband won't fly
my body won't take it no more

(This is a bit unfair, since I didn't try to exclude modal but non-future instances of the other word-types, but still…)

I didn't try to exclude non-standard stuff like "and she died and oh i know you won't gonna believe this i had her stuffed".

As far as I can tell, "'ll" and "gonna" in the transcripts are essentially always the modal auxiliary or the semi-modal, respectively.

0.97*15219 + 0.76*4975 + 32515 = 51,058

for forms of "will", and

0.87*29331 + 22176 = 47,694

for forms of "going to". Or retaining a more reasonable number of significant digits, about 51 to 48 in favor of "will".

Note that I haven't tried to calculate how many of these, in any of the lists, are actually references to future time. Some of them are (for example) predications of generic propensity like these:

only once in a while will i watch the show with my parents

seven six and he plays like six one you know what i mean you go through him shaq will go through you and through somebody else

being a nurse i'm gonna listen to the health more than i wanna listen to say uh financial news

Others are "past future" uses of going to like:

i was gonna say what's it like now

i heard originally when they were gonna make that they wanted to do all three

And of course we've completely ignored all the other ways, standard and non-standard, that English has for expressing future time reference in tensed clauses.

Still, I think it's fair to conclude that in contemporary American speech, forms of will and forms of going to fulfill this function about equally often, with will perhaps slightly in the lead.

1. ### language hat said,

December 12, 2008 @ 9:23 am

Fascinating! This is the kind of thing linguists should be doing, dammit. Facts, give us facts!

2. ### Chris said,

December 12, 2008 @ 9:52 am

Some of them are (for example) predications of generic propensity like these:

only once in a while will i watch the show with my parents

I don't see how this is different from the excluded uses of won't in constructions like "My car won't start" and "That dog won't hunt", aside from not being negative.

[(myl) As I explained, I used a different criterion in the case of won't. The reason? I did it later. Why didn't I go back and re-do the others (since going to has a similar usage, in fact)? Because breakfast time was over.]

3. ### Brett said,

December 12, 2008 @ 10:29 am

I don't have it handy, but the Longman Grammar of Spoken and Written English is full of facts like this, and might even address this particular question.

4. ### Joe said,

December 12, 2008 @ 11:30 am

Very interesting results! Thanks for looking into this.

5. ### g d gustafsson said,

December 12, 2008 @ 2:01 pm

This is similar to Swedish "kommer att" vs "ska". With ska (similar to "shall") being more intentional and "kommer att" similar to english "going to"

6. ### N said,

December 12, 2008 @ 2:03 pm

[(myl) Please don't. If you have something to say about a post that doesn't have open comments, the right thing to do is to send email to the author. In this case, since the author was just passing on an ad that appeared on a mailing list, you might address your remarks to the filmakers, or to the National Science Foundation, who funded the project. ]

As a now-graduated linguistics undergrad and a future linguistics grad student, I am outraged that it would cost $300 to obtain a copy of The Linguists. When we first heard about the movie, everyone in the psycholinguistics lab where I work was massively excited, and wondering where we can see it or get a copy of it. I would love to be able to share the movie with my family and let them see what some intrepid linguists do, but it is priced completely out of a regular person's budget. How does _that_ promote linguistics? [(myl) Obviously, neither Eric (who wrote the post in question) nor any of the rest of us know anything about this. Speculating wildly, I'd guess that we're looking at a phase of documentary distribution that is more like the initial theater run of a movie — the filmmakers hope to make some money from a relatively small number of purchases by schools and the like. Their goal, of course, is not to promote linguistics. I agree that the goal of promoting the field would be better served by selling the DVD at the more usual price of$15 or so; but I doubt that NSF or David Harrison have any real leverage in deciding what the pricing should be.

13. ### Mark Liberman said,

December 12, 2008 @ 9:53 pm

Molly: I was under the impression that the present progressive/continuous (example: I'm doing my laundry tomorrow) was the most common form, statistically, used for expressing future time in English.

It's not easy to check this directly, but here's an indirect check that makes it seem almost certain that your impression is false, at least in these transcribed conversations.

I scanned the conversations in the two collections cited above, for instances of am|is|are or 'm|'re|'s followed by some character string ending in -ing, and found 93,755 phrases containing at least one of these.

This underestimates the count of present progressive verb forms, since it omits e.g. cases with an interpolated adverb; but it also overestimates, since it includes things like "that's interesting" or "someone's parking space".

So I checked a random sample of 100 phrases, and found that 5 of them were things like "he's boring" or "that place is bring your own"; 15 of them were instances of "be going to"; and 80 were other present progressives. Of those 80, none (0) had a future time reference — all were either specifically tied to the current time, or were hypothetical or generic/habitual, for example

… just [because] they have the technology doesn't mean they're willing to use it …
… the thing is at first you are trying to think well what is this what is the catch …
… the different marriages that i've seen um the ones that are communicating even if they're joking you know …

I checked a second random sample of 100 phrases, and found 2 out of 100 with the kind of "future scheduled event" meaning of the example you cited:

… i have an internship that i'm leaving for in a couple of weeks …
… i'm closing tomorrow …

But there were no other examples of future-time reference in that sample, other than instances of "be going to". This suggests that only about 1% of the 93K hits, or around 900, will be future-time reference for present progressive forms other than the semi-modal "be going to". Whatever the true number is, it has little or no chance to be greater than the roughly 50K forms of "will" and the roughly 50K forms of "be going to" in the same collection.

14. ### Freddy said,

December 13, 2008 @ 1:12 am

Not to veer too off-topic, but around these parts (St. Louis), AAVE speakers use "fixing to" which reduces to "fiittin' tuh" (tt=some kind of glottal stop) which reduces yet again to "funna": "He funna go." It is very common here.

15. ### Lance said,

December 13, 2008 @ 3:44 am

Mark:

The thing about your be-going-to offer examples is that, in both cases, the offerer is "going to" do something (give you earnings, throw in a T-shirt) once you've already accepted the offer (by referring someone, by buying a collector's edition). Naturally, Copley's dissertation is more nuanced than what I was expressing; her proposal for "be going to" has no trouble with these example.

[Specifically, if I recall it correctly, she says that "be going to X" means that all future worlds from this point are worlds in which X, which means that essentially those sentences mean "you getting earnings is inevitable and unavoidable, not from the current state of affairs, but in those worlds where you've referred someone". A standard offer, like the Madera billboard, doesn't have that "in those worlds in which" context, so the result is an assertion that oil-changing is inevitable, period.]

As for the present progressive examples, I meant specifically "the present progressive as a future marker". I mean, obviously I can say "The Red Sox are probably losing right now", and I'm talking about a strong belief; just not one that involves the future.

Anyway, I'm mostly saying these things to stress that I, and not Bridget, should be held responsible for any miscategorization I've made of her work.

But as to the first comment in reply to me…I think that that's more or less exactly what I'm saying, i.e. that the nuances of sense make it hard if not impossible to get meaningful results from the kind of corpus counting you're doing. I mean, are there "sense-analysis" corpora? It doesn't seem to me that such a thing would even be feasible. A lack of taxonomy may be a "problem with a lot of research on semantics and pragmatics" in terms of getting a semantically-tagged corpus, but I think it's not so much a present failing as an inherently nigh-impossible task.

16. ### Mark Liberman said,

December 13, 2008 @ 5:39 am

Lance: the nuances of sense make it hard if not impossible to get meaningful results from the kind of corpus counting you're doing. I mean, are there "sense-analysis" corpora? It doesn't seem to me that such a thing would even be feasible.

Yes, there are plenty. You'll find references here and here, for a start. An especially clear summary of the state of the field ten years ago can be found in Adam Kilgariff, "Gold Standard Datasets for Evaluating Word Sense Disambiguation Programs", 1998.

It's true, as I said, that it's notoriously difficult to get intersubjective agreement about what the senses are and how to assign them to particular cases. This led Adam Kilgariff to write a famous paper "I Don't Believe in Word Senses", Computers and the Humanities 31(2), 1997. From his abstract:

An analysis is presented in which word senses are abstractions from clusters of corpus citations, in accordance with current lexicographic practice. The corpus citations, not the word senses, are the basic objects in the ontology. The corpus citations will be clustered into senses according to the purposes of whoever or whatever does the clustering. In the absence of such purposes, word senses do not exist.

On the other hand, one of Adam's own notable contributions has been software that helps lexicographers to cluster citations for the purpose of dictionary-making.

17. ### Mark Liberman said,

December 13, 2008 @ 5:40 am

Lance: A lack of taxonomy may be a "problem with a lot of research on semantics and pragmatics" in terms of getting a semantically-tagged corpus, but I think it's not so much a present failing as an inherently nigh-impossible task.

I'm not sure that I understand. Are you really saying that the current methods for analysis of meaning are such that they can't, as a matter of principle, be applied to specific examples that are not chosen by the investigator? Or that if these methods are applied in an attempt to understand a series of such examples, e.g. those involving the time-reference of the present progressive in English, they don't converge on a set of descriptive types that would cover a monotonically increasing proportion of unseen examples? Or that we shouldn't expect, even in principle, that different investigators would be able to agree about how to do this?

All these things might be true; but if they are, it's a depressing picture of the state of the field.

It's one thing to say that word meanings don't fall naturally into a simple tree-structured taxonomy — the history of semantics since Wilkins gives us plenty of reason to believe this, as I have often argued. But that's different from saying that semantics and pragmatics, in principle, can't give us any intersubjectively valid way to describe and categorize the ways that words are used.

And in fact, things are not so bad. With careful iterative design of the categories and training of annotators, sense-tagging of corpus examples can be done with good levels of inter-annotator agreement.

18. ### Mark Liberman said,

December 13, 2008 @ 8:44 am

Lance: I mean, obviously I can say "The Red Sox are probably losing right now", and I'm talking about a strong belief; just not one that involves the future.

Well, the season is over, and spring training doesn't start for another two months, so at this point any belief about the fortunes of the Red Sox is unavoidably about the future (or the past)…

But more seriously, the examples that I cited seem clearly to state beliefs about possible future events, e.g. "if you buy Cypress you're getting $1.75 billion of SunPower", which (I think) was addressed to an audience of readers who had not previously even considered the option of buying Cypress. 19. ### John Niekrasz said, December 13, 2008 @ 11:10 am Mark said: 'This underestimates the count of present progressive verb forms, since it omits e.g. cases with an interpolated adverb; but it also overestimates, since it includes things like "that's interesting" or "someone's parking space".' If you want a more robust way of automatically picking out and (syntactically) classifying the verb phrases for your manual semantic/pragmatic analysis, I can suggest the LT-TTT2 software. The output of the "chunking" stage in their out-of-the-box processing pipeline will be much more accurate than your regular expression approach. It will give you values for tense, voice, aspect, and modality of verb phrase chunks. And the software is a breeze to use. I use it on transcripts of conversational speech to do something quite similar to what you are doing, and it works quite well. (If you actually get around to trying it, use the -D 128 flag on the call to verbg.gr to get verbose output about the verb chunk rules that were used). http://www.ltg.ed.ac.uk/software/lt-ttt2 Here's a paper which uses this software for IR on legal texts: http://homepages.inf.ed.ac.uk/bhachey/PUBS/ailaw-egov-preprint.pdf And why not add other modals/semi-modals to your list? These often express future time too, don't they? 20. ### DM said, December 13, 2008 @ 11:21 am Interesting as always. Posted at 0854. It's always a source of amazement to me how long Mark Liberman has for breakfast. Is the average linguist post-grad going to have this much free time? Or will they be under constant pressure to prove their cost effectiveness? The results of my own breakfast experiments, involving how quickly I can eat a piece of toast and drink a cup of tea, have so far been inconclusive. 21. ### Lance said, December 13, 2008 @ 9:31 pm I'll take a look at the semantic corpus links. As for the present-progressive-future, though: perhaps I wasn't clear when I said "the present progressive can't be used for prediction or mere strong belief". I didn't mean "the present progressive can't be used when emedded in predictive/belief contexts" (the sentence I think the Red Sox are playing the Yankees tonight is a pretty clear indication that the present progressive is usable in a sentence that, overall, expresses belief). I meant "the present progressive can't be used to itself indicate prediction or mere strong belief". So in: But more seriously, the examples that I cited seem clearly to state beliefs about possible future events, e.g. "if you buy Cypress you're getting$1.75 billion of SunPower", which (I think) was addressed to an audience of readers who had not previously even considered the option of buying Cypress.

I'm still not seeing anything future-marked here. Non-past, yes, but basically it's saying: At all world/times where [you buy Cypress], [(simultaneously) you get \$1.75 billion of SunPower]: the present progressive just marks that X happens at the same time as some other Y, not that X happens in the future. And the present progressive is certainly not itself expressing a belief: there's no doubt in the speaker's mind that getting SunPower happens in those circumstances. The future possibility comes from the conditional, not from the use of the present progressive.

22. ### Steve Tripp said,

December 14, 2008 @ 12:19 am

Seven futures
1. The ship is about to leave.
2. The ship will leave at 7 o'clock.
3. The ship is going to leave at 7 o'clock.
4. The ship leaves at 7 o'clock.
5. The ship is leaving at 7 o'clock.
6. The ship will be leaving at 7 o'clock.
7. The ship is to leave at 7 o'clock.
These all have different nuances, which are clearer if we change the subject to "I."
1. I am about to leave. (immediate future, without delay)
2. I will leave at 7 o'clock. (an offer or predication; often if some condition is met)
3. I am going to leave at 7 o'clock. (my intention is to)
4. I leave at 7 o'clock. (according to a schedule, not under my control)
5. I am leaving at 7 o'clock. (according to a schedule, under my control)
6. I will be leaving at 7 o'clock. (after an interval)
7. I am to leave at 7 o'clock. (a formal announcement)

[(myl) See the previously cited post "The Lord which was and is" for a list of other options as well. However, at least according to the counts here, the options other than "will" and "be going to" are rather rare in current spoken American English.]

December 14, 2008 @ 4:38 pm

Here's a good example of non-future "will": http://quotation-marks.blogspot.com/2008/12/those-giant-prize-checks-are-no-good.html

24. ### Merri said,

December 19, 2008 @ 8:16 am

But isn't there simply a difference in meaning ?

"Will" and "Shall" are auxiliaries for unlinked future.
"Go to" is an auxiliary for linked future, the exact symmetric to linked past, aka past perfect.

The difference between "I shall do" and "I'm going to do" is the same as between "I did" and "I have done", only it's less visible because of the auxiliaries.