## When a word is redundant enough to be omitted

I am greatly enjoying Steven E. Landsburg's book More Sex Is Safer Sex (Free Press 2007, paperback 2008). Landsburg is a brilliant popularizer of his academic subject, economics. He writes the way popular material should be written, I think. I wish I could do it that well. His sentences are exactly the right length. Mine are too long (this one isn't, of course, or at least it wouldn't have been, except that I went and added this bit… oh, damn…). However, just because someone is a brilliant writer, that doesn't immunize them against unintentional grammar slips. We all make those. And although we on Language Log often defend users of the language against stupid claims of ungrammaticality by prescriptive usage authorities who don't know their facts, we don't deny the existence of flatly ungrammatical sentences that occur anomalously in excellent prose. Take a look at this clearly ungrammatical sentence on page 33 of the paperback of Landsburg's book:

 (1) *This accounts for the fact that family sizes of seven, eight, or nine children were common in the nineteenth century but rare today.

The question is how to say in precise terms why it is ungrammatical. Keep in mind that this alternative would have been perfectly grammatical:

 (2) This accounts for the fact that family sizes of seven, eight, or nine children were common in the nineteenth century but rare in the twentieth.

The answer has to do with conditions on ellipsis. My first crack at the generalization would be that you can omit a tensed occurrence of be when there is another one and the repetition would be redundant, but only if the tense form of the occurrence you omit is the same as the tense form of the one you keep.

In (2), the omitted form of be would have been were. But in (1) it would have been are. Not close enough. Tense inflection marks an important meaning distinction in English. You can't omit something on grounds of its redundancy if it carries semantic information that is not redundant. (In Chinese, I would predict that there would be no sign of this problem, because Chinese verbs don't carry tense inflections.)

Notice that I am supplying a grammatical generalization. I'm saying this is not just something to be decided by using your common sense about what's redundant and what's not. Lots of grammar and usage books seem to imply that everything follows from simply being logical, but in most cases they're quite wrong. Consider this example (of a different construction, but I use it only to illustrate a general point):

 (3) I haven't done it yet, but I will.

What's missing after will? The sentence means "I haven't done it yet, but I will do it." Do it is not the same as done it. And in fact you could say that the do it verb phrase makes reference to future time while the done it verb phrase makes reference to past time. Yet the example seems grammatical. Why? Because ellipsis of verb phrases in English is allowed to obscure differences between inflections on non-finite verbs that are grammatically demanded by the preceding verb (have demands the past participle done, but will demands the plain form do). That isn't a loss of semantically non-redundant information — not even if it is true that do it refers to the future while done it refers to the past.

How do I know there isn't semantically non-redundant information there that's being omitted? It's a judgement based on what the grammatical evidence shows. You can't just close your eyes and assess whether the information seems redundant to you or not, by some sort of internal intuition about information quantity. You have to open your eyes and look at the grammatical as well as semantic evidence, and propose a general hypothesis. A hypothesis that in principle could be shown to be false by further evidence.

My hypothesis (putting it fairly loosely in order to be brief) is that English syntax counts tense inflection as inherently significant, so a tense inflection is only redundant when it is the same as some other tense inflection in the sentence, but it doesn't count participial inflections as inherently significant.

The reason this isn't just an opinion, indistinguishable from other opinions one might have, it that it makes a prediction. The prediction is that if I could find a way to re-express (1) with participial inflection on the two occurrences of be, the result of omitting the second one would be grammatical. Well, I can, and it is:

 (4) This accounts for family sizes of seven, eight, or nine children being common in the nineteenth century but rare today.

Same meaning, as near as makes no difference; but now it's grammatically perfect.1 Why? Because although the nineteenth century is long ago and today is now, and that difference has been suppressed by the way the sentence is constructed, the issue is not about whether information in some naive semantic sense has been suppressed or masked or concealed or lost. The issue is about what the grammar permits. English grammar doesn't permit coordinations of the form

to be expressed as

unless either (i) Verb1 and Verb2 are both non-tensed or (ii) they both have the same tense inflection. That's what the evidence tells us the grammar demands. It doesn't have its roots in semantics or logic; these are facts of syntax (though they have semantic connections).

FOOTNOTE

1 There are people who would say sentence (4) is actually ungrammatical because it needs genitive case on the subject of the gerund-participle, thus: This accounts for family sizes of seven, eight, or nine children's being common in the nineteenth century but rare today. Those people are wrong, and they should read the eye-opening article in Merriam-Webster's Dictionary of English Usage on the point (look for the entry headed POSSESSIVE WITH GERUND); but stick the genitive 's (inaccurately called the "possessive" 's) on the subject if you want; I don't care, and it's not relevant here.

[Update, much later: I was surprised to find so many people in the comments below simply refusing to accept that there was any ungrammaticality involved, even in short sentences like Mark's *Duels were common in 1830 but rare today. However, I was able to get in touch with Steven Landsburg himself, and he kindly confirmed that he regards this as ungrammatical, and likewise the longer sentence that he wrote. So I think that should settle the matter: I regard present-tense verbs as not omissible on grounds of an earlier past-tense verb having appeared in the sentence, and so does he. There may be people who simply don't have the same ellipsis rules as Landsburg and me. And there are certainly other constructions with different properties; for example (in answer to a commenter below), it seems much better to say ?Duels were common in 1830, but not today. I'm not completely sure that's perfect, but I see a difference.]

1. ### John Cowan said,

August 25, 2008 @ 11:38 am

This is a prize case of why I don't like to use the word "ungrammatical" except relative to a specific grammar (in the sense that "a//b" is ungrammatical C but grammatical C++, for example), and why I insist that what people (as opposed to parsers) make when they react to natural-language utterances are not grammaticality judgements but acceptability judgements.

I have absolutely no trouble with your original sentence ending "but rare today", and rate it 100% acceptable. When you called it ungrammatical, my reaction was "WTF?!", and I had to go back and scrutinize it to see that you were complaining about the tense mismatch.

Now certainly there are sentences like "I went to Rome yesterday and to Paris tomorrow" that are flatly unacceptable to me, so it isn't that I speak some variety of English hitherto unknown to science. I suspect what is going on is that tense clash in elliptical coordinated that-complements seems acceptable because they feel like mere surface variants of the corresponding infinitive complements, which as you say are entirely acceptable. Yet that can't be the whole story either: "He knows that I went to Rome […]" is just as bad as the original.

(Chinese may have similar strictures about aspect markers, which are about as important in Chinese as tense in English, but it wouldn't surprise me if it does not, given how often Chinese sentences are interpreted by context and common sense rather than by rule: Li & Thompson give us "She is also [an example of someone married to] an American husband", and La Polla says that "George dropped the watermelon and [it] burst" (ergative coordination) and "George dropped the watermelon and [he] was embarrassed" (accusative coordination) are equally acceptable.)

2. ### Mark Liberman said,

August 25, 2008 @ 11:48 am

John Cowan: I have absolutely no trouble with your original sentence …, and rate it 100% acceptable.

I had the same first reaction, but I think we're getting confused by the length and complexity of the subject phrase. Try instead:

Duels were common in 1830 but rare today.

Still 100% acceptable? I don't think so.

3. ### Shimon Edelman said,

August 25, 2008 @ 11:53 am

I share John Cowan's aversion to the word "ungrammatical". In Geoff Pullum's posting, the root of the problem is nicely exposed for all to see in this passage: "It doesn't have its roots in semantics or logic; these are facts of syntax (though they have semantic connections)." The ontological status of "facts of syntax" (or grammaticality that's independent from acceptability) is the same as that of the tooth fairy: there is no independent empirical evidence for it, and phenomena attributed to it can be better explained by other means.

-Shimon

4. ### GAC said,

August 25, 2008 @ 11:58 am

I found (1) completely grammatical even after your explanation until Mark made his substitution. And even then, I still find it only questionable, not flat out ungrammatical.

5. ### Geoffrey K. Pullum said,

August 25, 2008 @ 12:00 pm

My friend Shimon Edelman believes that the notion "grammatical" has tooth-fairy status, and that the concept to replace it is that of probability. There are utterances with high probability features and utterances with low probability features, and that's all. He really does appear to believe this. It is a view that I have urged him to jettison. It will not be possible (so I predict) to differentiate my (1) and (2) in probability terms. As to whether there simply isn't a grammaticality issue with (1), as John Cowan thinks, we need to have his answer to Mark's question. (If there is no ungrammaticality in (1), by the way, we are fine, and formulating the rules of grammar becomes easier, not harder. Nothing about Shimon Edelman's view, that the distinction between grammatical and ungrammatical doesn't exist, would get any support from that purely empirical finding, if it were indeed a finding.)

6. ### Dan Milton said,

August 25, 2008 @ 12:44 pm

Is anyone else bothered by the apposition of "sizes" and "children"? I'd prefer "This accounts for families of seven, eight, or nine children being common in the nineteenth century but rare today".

7. ### Julia Hockenmaier said,

August 25, 2008 @ 12:45 pm

Well, "were rare today" only makes sense if the day (or an event you're referring to that happens during the day) is over and you're talking about something that happened during the day/event. One example I found online was from a birdwatcher who said "I was looking for sparrows which were rare today". Of course, he means observations of sparrows, and isn't necessarily talking about the population size of the birds (perhaps it was rainy, and the birds didn't come out, or whatever…).

8. ### Shimon Edelman said,

August 25, 2008 @ 12:51 pm

In his reply to my comment, Geoff kindly offered a framework within which to interpret my repudiation of "grammaticality." I feel obliged to clarify it, however briefly. If grammaticality is dissociated from acceptability, it becomes empirically vacuous (joining the club that has "facts of syntax" and "competence" among its members); if it is defined in terms of acceptability, it loses its claim for a separate existence. Probability enters the picture as follows: utterances whose various features — in the given context and given the sum total of the listener's experience — are more probable will be deemed by the listener more acceptable. One of the features in question is utterance length (hence the difference in acceptability between long and short versions of Geoff's example with which the present thread started). Importantly, however, the probabilities in question are always conditioned on context (including extralinguistic context); this is the crucial component of the framework to which I subscribe that has been left out of Geoff's summary of my views.

9. ### Bill Walderman said,

August 25, 2008 @ 1:20 pm

What's wrong with me? I just can't seem to find any problem with the sentence about large families and I don't have a problem with the sentence about duels, either.

Large families/duels were common in the nineteenth century but not today.

Are these sentences acceptable?

10. ### Mark Liberman said,

August 25, 2008 @ 1:49 pm

Bill Walderman: What's wrong with me?

Nothing.

At least, the evidence available so far suggests that you've got just a different (implicit) idea about when certain kinds of ellipsis are OK.

This diagnosis comes with no warranty, however, beyond our normal promise to refund twice your subscription price in case of less than full satisfaction.

11. ### Ellen Seebacher said,

August 25, 2008 @ 1:49 pm

I'm Steven's copyeditor for that book (and a longtime fan of yours, Arnold's, and Language Log's), and you're quite right. I missed it. :}

12. ### Sridhar Ramesh said,

August 25, 2008 @ 2:00 pm

Is the reason for preferring the term "genitive 's" over "possessive 's" simply that it can be used for many purposes other than indicating possession? Or is there something else to it?

13. ### Patrick Schulz said,

August 25, 2008 @ 3:24 pm

I'm German and I also know such constructions:
Das war früher mal modern, aber heute nicht mehrthis BE.pst formerly once modern, but today not more

I think, a more interesting question than "why is it ungrammatical?" is: "why do (or can) people say it, anyway?"…

14. ### J said,

August 25, 2008 @ 3:41 pm

Instead of discussing grammaticality in terms of low and high probability, might it make sense to use the cloud concept that (I think) has been discussed here before in other contexts (ie, we evaluate a given object as being 'a table', for example, based on a cloud of criteria of which the object might fit some strongly but others weakly, etc)?

In terms of the sample statements, Mark Liberman's shorter sentence about duels does seem ungrammatical while the original longer sentences (1) and (2) seem perfectly fine, so I think that for me the length is a factor. It's a dissonance thing – when the second, mismatching "be" is omitted or separated from the original by a significant enough space, my brain fills in what's needed to make the match.

15. ### Ivan said,

August 25, 2008 @ 4:44 pm

I also find the original sentence ungrammatical, but I'm a non-native English speaker, so it's probably due to influences of my native language. I just realized that in a situation like this, I would probably end up writing something like this:

This accounts for the fact that family sizes of seven, eight, or nine children were common in the nineteenth century but are rare today.

Could someone maybe comment on how this sentence sounds to a native English speaker's ear? Is it awkward, or even totally ungrammatical? Sorry for the slight straying off topic.

16. ### mike said,

August 25, 2008 @ 4:48 pm

w/r/t point (3) here, I'd like to put in a request (as opposed to hijacking the comments thread, which of course is streng verboten) on a possible related difference between Am and Br English — in my experience, British English is a lot less comfortable with modals that have only implied verbs, e.g.:

Q: Can you do that?
A (Am): I can.
A (Br): I can do.

Thx.

17. ### Bill Walderman said,

August 25, 2008 @ 5:31 pm

"This accounts for the fact that family sizes of seven, eight, or nine children were common in the nineteenth century but are rare today.

"Could someone maybe comment on how this sentence sounds to a native English speaker's ear? Is it awkward, or even totally ungrammatical?"

Sounds good to my ear and not the least awkward, but then the sentence about duels didn't shock my intuitions about well-formed English sentences.

18. ### Lee Morgan said,

August 25, 2008 @ 6:17 pm

"This accounts for the fact that family sizes of seven, eight, or nine children were common in the nineteenth century but are rare today."

This was my (native English speaker's) first instinct to correct the sentence, as well.

19. ### S Onosson said,

August 25, 2008 @ 6:41 pm

Bill Walderman, nothing's wrong with you; I share your intuitions completely.

20. ### kyle said,

August 25, 2008 @ 6:48 pm

is the merriam-webster usage dictionary a reliable resource?

21. ### Marc Hamann said,

August 25, 2008 @ 7:07 pm

I had to read this post very carefully twice to convince myself it wasn't intended as a parody of Chomskyan reasoning. ;-)

22. ### mollymooly said,

August 25, 2008 @ 7:08 pm

Like others, I agree that Ivan 's rewording corrects the grammatical error. However, to me, the extra word is an intrusion, which draws attention to itself and seems to have no function other than satisfying the formal requirement of grammaticality. It is reminiscent of the verbose formulae of legal texts. If I were a copy editor, then I would simply let the "error" stand, happy that the disease was milder than the cure. My intuition is that, although "grammaticality" and "acceptability" are closely correlated, neither need imply the other.

23. ### Steve Harris said,

August 25, 2008 @ 8:44 pm

@Ivan:

I was immediately put off by the original (1) for the reason cited: what I tend to call broken parallelism (though that seems to be a poor terminology, given the evidence of (3)–just as broken in parallelism, but perfectly fine to my ears). And my solution was exactly the same as Ivan's: I want to put "are" in there.

But is that, as mollymooly says, too wordy?

(5) Duels were common in 1830 but are rare today.

No, it's not wordy in my aesthetic. And the original ("were common but rare today") grates very badly on my ear.

24. ### David said,

August 25, 2008 @ 11:45 pm

Another native English speaker, all through reading Geoffrey’s post I was distracted by the urge to cry out, “Why not just ' . . but are rare today'?” So I agree entirely with the comment from Steve Harris, and am thus also supporting Ivan.
However I find mollymooly’s objection interesting. It strikes me this could be seen as a ‘The Two Cultures’ thing. For, despite Geoffrey’s, “Same meaning, as near as makes no difference; . .”, (1) does carry at least one subtly different nuance from (4).
Having a scientific background, I am often guilty of using the fussily meticulous “the fact that” structure – present in (1) but not in (4), and significantly also not in Steve’s 'streamlined' example (5). It surely serves the purpose of reminding the reader that this 'given' is an established fact rather than a glib assertion? In a technical paper it would probably be justified with a reference, unless perhaps it happened to be a well-understood fundamental truth of the discipline concerned.
However mollymooly criticizes the addition of ‘are’ from the stance of “a copy editor". To the scientist’s eye this may appear a slick ‘arty-farty’ attitude of, “Don’t let the truth stand in the way of a good story"; so that the ‘fact’ is taken for granted, with the principal aim being to retain the reader’s attention with a story that keeps moving. Humanities v. Sciences?
In point of fact, of course, there is no reason why (4) could not also carry a reference. If it is to do so, it would probably be preferable to the ‘are’ version for the scientific paper too, precisely because it is less verbose and does read more smoothly.
So I wonder if it is having already suffered the clumsiness of the “the fact that” construction that contributes to mollymooly’s instinct to shy away from yet another word near the end of the sentence. Understandably so, I might add.

25. ### Bill Walderman said,

August 26, 2008 @ 8:27 am

As I mentioned earlier, I'm not troubled by the two sentences that were held up as paradigmatic of the ungrammatical ellipsis of a finite verb form that differs in tense inflection from the coordinate verb form.

Enormous families/duels were common in the nineteenth century but rare today.

However, I'm happy to report that my internal grammatical intuitions object vociferously to sentences in which the coordinate verbs are not forms of "be."

*Idiots often fought duels in the nineteenth century but rarely today.

*Couples often raised large families in the nineteenth century but rarely today.

These aren't strictly speaking parallel to the earlier sentences because the contrasting element is an adverb, not an adjective phrase.

For me these sentences can be reshaped to be grammatical by using the proverb "do."

Idiots often fought duels in the nineteenth century but rare do [so] today.

Couples often raised large families in the nineteenth century but rarely do [so] today.

For me "so" is optional in these sentences.

26. ### dr pepper said,

August 26, 2008 @ 8:44 am

I would have passed over the first sentence without comment. And the second one, with duels' seems a little ungainly but if i encountered in the middle of a larger text, i'd probably not notice.

When i stop and examine either of those sentences i do realize that i am supplying a verb. However it seems so natural and automatic that it would never occur to me to consider such a sentence to be deficient in any way.

I believe that that is because the verb in question is to be', which is handled differently from other verbs. That is, `to be' expresses a state of being set in time and the act of transposing time periods is so frequent as to be unremarkable.

But it is different with an action verb.

"People commonly drove harses in the nineteeth century, but cars today."

Now that sentence would trip me up.

27. ### Jorge said,

August 26, 2008 @ 9:33 am

"You can't just close your eyes and assess […]"

28. ### Mark Seidenberg said,

August 26, 2008 @ 11:23 am

The issue that Edelman raises–about the status of "grammaticality" vs. "acceptability," has a long, fraught history and I don't think it's ever been resolved adequately. Carson Schutze's book (The empirical base of linguistics, 1996) is a good place to start. Two brief points on what has been an extensive discussion since the beginnings of generative linguistics.

1. about the "lack of independent evidence" for grammaticality that Edelman mentions: many theoretical linguists think it isn't required, for basic, principled reasons. Chomsky's argument from the 1970s went something like this (I can't find the quotes but a better web searcher like Mark probably can). The simple sentences of the language aren't informative, every theory can account for them, and so we need to look at edge constructions that are unusual and infrequently used ("Which pot is soup easy to cook in," and so on). That's where the information is. Then it turned out that grammaticality judgments for such sentences usually turned out to be inconsistent even among experts (as in the sentence Geoff discussed; BTW both short and long versions are just fine for me). At that point, Chomsky made a very interesting move, which had enormous influence: he said we should look to the theory to help adjudicate the unclear cases.

This means that the critical evidence for grammaticality is theory-internal, based on criteria such as whether the analysis of a particular of a borderline sentence conformed to principles developed in connection with other sentences. A borderline construction might be judged grammatical if doing so avoided complicating the grammar, violating some other elegant bit of formal analysis, etc. Of course, this becomes highly circular. The complex sentences provide the data for the theory of what's grammatical; the theory decides the cases where grammaticality judgments are unclear.

2. Grammaticality has a fuzzy structure: there are clear cases at the extremes, which seems encouraging, but then there are sentences that fill in the rest of the continuum. What to do with them? People maintain the idea that there should be a boundary in there someplace, and differ where they place it and why, which generates endless discussion, prescriptivist fury, etc. Some recent theoretical work has introduced graded notions such as degrees of grammaticalitiy. But I say, why bother? The important issue is what's going on in the black box that is comprehending and producing utterances; how the outputs of that box are sorted and judged seems less important, to me. Which is why I am a psycholinguist, not a syntactician.

Grammars such as the one that Geoff co-authored are hugely useful, of course; I just don't think "grammaticality" has much theoretical force or gets us far in understanding language acquisition and processing and their brain bases. Or should have as much as it has been assigned.

I don't mean to be polemical here; there is a lot more to be said on the issues on both sides, but this is a comment in a blog not a thesis.

29. ### Chas Belov said,

August 27, 2008 @ 12:43 am

My reactions as a former western Pennsylvanian:

1) *This accounts for the fact that family sizes of seven, eight, or nine children were common in the nineteenth century but rare today.

Sounds perfectly acceptable to me.

4) This accounts for family sizes of seven, eight, or nine children being common in the nineteenth century but rare today.

Acceptable but stilted.

Duels were common in 1830 but rare today.

Acceptable if a bit odd. Yes, I think it's the distance between the used past verb and the omitted present verb.

Large families were common in the nineteenth century but not today.

Ditto.

This accounts for the fact that family sizes of seven, eight, or nine children were common in the nineteenth century but are rare today.

Acceptable but stilted.

Given that it does seem to be the distance between verb and verb-null distance, I wonder where the crossover might occur.

30. ### ajay said,

August 27, 2008 @ 11:53 am

(1) *This accounts for the fact that family sizes of seven, eight, or nine children were common in the nineteenth century but rare today.

The question is how to say in precise terms why it is ungrammatical. Keep in mind that this alternative would have been perfectly grammatical:
(2) This accounts for the fact that family sizes of seven, eight, or nine children were common in the nineteenth century but rare in the twentieth.

A minor point: "today" is not in the twentieth century. It is, in fact, in the twenty-first century.

31. ### jamessal said,

August 27, 2008 @ 12:37 pm

I for one am eager to read a response from either Mark or Geoff to Mark Seidenberg's post — which, when I close my amateur eyes and assess, sounds right to me.

32. ### blahedo said,

August 27, 2008 @ 4:45 pm

On first reading, (1) sounded fine, but only because I was supplying the extra "are"—careful reading marks it as definitely ungrammatical in my ideolect (as is Marks' shortened substitute about the duels).

Sentence (3), however, reads just fine to me. At first I thought this might be some property of "do" or something related to the AmE/BrE difference noted also by mike, but that can't be it as other verbs and phrases work too:

I haven't eaten it yet, but I will.
I haven't slept yet, but I will.
I haven't given Chris the book yet, but I will.

So it starts to look like ellipsis of whole phrases may be okay, even if differently tensed, as long as it's the whole phrase. I also suspected it was hinging on the haven't/will pairing, but I can construct other pairs (including modals as well as tensed forms) that work fine:

I have given Chris candy before, but I won't again.
I gave Chris candy once, and might again.

I think the Standard English way to handle these constructions is with an insertion of a "do so" ("might do so again"), but that's not needed (for me) in these sentences. If it is *just* the verb being elided, the tenses have to match up, though.

33. ### nominalize said,

August 30, 2008 @ 1:34 am

Let's be sure not to confuse 'grammatical' acceptance with accommodating sentences that are not well-formed, but whose meaning can be reasonably figured out. This happens often in fieldwork, where consultants will accept a sentence that has been established as ungrammatical, because they know what you meant by it. They might say "that's fine, but it sounds better if you say …," and this is a type of comment that has popped up in the comments about the original examples of this post.

I was able to accommodate (1) when I read it silently, but when I spoke it out, it sounded bad unless an "are" is explicitly inserted. I think a lot depends on the natures of the predicates and their modifiers— the time-spans involved in (1) are very large and somewhat abstract.

Here's a similar example with just the verb removed, that is more episodic and 'concrete'. I can't accommodate this.

(i) *Bill was at the park this morning, but at the mall right now.

34. ### John Cowan said,

September 3, 2008 @ 12:03 pm

I do find the "duel" version problematic, but I conjecture (and nobody has picked up on this element of my comment) that that's because the failure to match tenses is in the main clause, rather than (as originally) in a that-complement. I think looseness of tense is more acceptable in such complements because they are closely matched semantically to infinitive complements, which carry no tense information. And yet.