Why definiteness is decreasing, part 1

« previous post | next post »

I ended yesterday's post ("Decreasing Definiteness") with a promise to say more about why the frequency of the has decreased so much over the past century or so, and this morning's post will start to redeem that promise.

As several commenters observed, there are probably several different things going on here. But I think that one relevant factor is decreasing formality of style.

I'll leave for another day the question of what formality really is, and why a decrease in formality correlates with a decrease in the frequency of the. In this post, I'll try to establish two simpler points:

  1. In English text that's more formal, in common-sense terms, the is more common;
  2. The formality of (various genres of) English writing has been decreasing over the past century or so.

Here are frequencies from the 1991 Switchboard corpus of conversational telephone speech (about 3.3 million words overall), from the 2003 Fisher corpus of conversational telephone speech (about 23 million words overall), and from the various genres of the 1990-2012 BYU Corpus of Contemporary American English (90-95 million words per genre):

the  2.98%  2.47%  4.65%  5.27%  5.35%  5.34%  6.42%
a/an  2.39%  2.04%  2.57%  2.43%  2.54%  2.76%  2.31%

Note that

  • the is less common in speech than in writing;
  • the is much less common in informal speech (SWB and Fisher) than in formal speech – the COCA "spoken" genre is drawn from "All Things Considered (NPR), Newshour (PBS), Good Morning America (ABC), Today Show (NBC), 60 Minutes (CBS)", etc.;
  • the is most common in the most formal (here "Academic") writing.

The distribution of a/an is both more even, and also less tied to formality.

We should note in passing that the decline in the's frequency seems to be continuing in recent years — the overall frequency across genres in COCA is 5.40%, but divided up by time periods, it's

1990-1994 1995-1999 2000-2004 2005-2009 2010-2012
 5.62%  5.43%  5.47%  5.20%  5.13%

The difference between the's frequency in the 1991 Switchboard collection (2.95%) and the 2003 Fisher collection (2.47%) is also consistent with an on-going historical decline — but the two collections are not balanced as to age, region, educational background, etc., and I think it's likely that there are other explanations for this.

What evidence is there that published written text has been getting less formal, on average, over the past century or two? I doubt that many people will contest this idea, but it's easy to find quantitative support for it. For example, the frequency of not-contractions has increased steadily:

The different lines represent the proportion of contraction in:

1: is not⇔ isn't
2: were not ⇔ weren't
3: has not⇔ hasn't
4: was not⇔ wasn't
5: should not⇔ shouldn't
6: had not⇔ hadn't
7: would not⇔ wouldn't
8 could not⇔ couldn't
9: have not⇔ haven't
A: does not⇔ doesn't
B: will not⇔ won't
C: does not⇔ doesn't
D: can not/cannot⇔ can't
E: do not⇔ don't

Thus the line plotted with plum-colored E's gives the proportion

isn't/(isn't + is not)

As discussed in "True Grit isn't true" (12/29/2010), it seems that not-contractions have been normal part of informal spoken English for a couple of hundred years, but they have infiltrated the written language only gradually, in a process that is still going forward.

For comparison, here's the frequency of n't and 's contractions in the SOTU messages from 1914 to 2014:

Another example would be the increased use of first and  second person pronouns in State of the Union messages, as discussed in "The evolution of SOTU pronouns", 1/28/2014:

This change in pronoun frequencies is not a general characteristic of English writing as a whole, but it reflects a change in the style of the SOTU messages over the course of the 20th century.

See also "The genitive of lifeless things", 10/11/2009, "Possessive with gerund: Tragic loss or good riddance?", 9/18/2010, and "Mechanisms for gradual language change", 2/9/2014, for some other examples where writing seems to be moving in the direction of speech.

And as "part 1" in the title of this post suggests, formality is not the only feature that's probably relevant to changes in definite-article frequency — we'll be back.



  1. Ralph Hickok said,

    January 9, 2015 @ 8:51 am

    When I wrote a local history book for a non-profit organization, the organization's directory asked a retired English teacher to read it and make suggestions. The only thing she did was to get rid of all the not-contractions.

  2. RaySmi said,

    January 9, 2015 @ 10:29 am

    Interesting ! Your analysis is impressive. As a non-native speaker/ teacher of English I have always struggled to grapple with the idea of this 'definiteness' of the while teaching. Apart from the journalistic genre of English mentioned in your earlier segment, the rising number of second language speakers of English globally and the increased interaction on social networking, at times with limited characters, might also be contributing to the deletion of 'the'.

  3. J. W. Brewer said,

    January 9, 2015 @ 1:04 pm

    The graphs in the prior post showed a modest increase in frequency of a/an over the same time frame there was a decrease in frequency of the, so the shift in ratio was driven by both sides, whereas here it looks like a/an doesn't systematically vary with formality level, so if we think the a/an increase is substantial enough not to be random noise, some other causal factor(s) need to be identified. So that's at least one open issue for further analysis (and I realize that this post was intended to cover a few but by no means all of the follow-up issues generated by the prior post, so it's not intended as a complaint).

  4. J. W. Brewer said,

    January 9, 2015 @ 1:15 pm

    That "the" is dropping within published edited prose is consistent with the notion that much such prose is in a less formal register than it was a century ago, but doesn't seem imho to be suggestive of any influence from textspeak or headlinese or the potential idiosyncrasies of ESL learners, since those are the sorts of deviations from standard/prestige usage that you would expect the copyeditors of such prose to still be weeding out fairly ruthlessly even if they were more willing than in the old days to accept e.g. -n't contractions as a legitimate stylistic choice.

  5. Mark F. said,

    January 9, 2015 @ 3:00 pm

    But why is the definite article more common in more formal prose? I'm embarrassed to say I don't have any real intuition for that. Is it because there's more anaphora in less formal prose? In other words, I was taught in 9th-grade history class not to say "This is why…", but rather to come up with a summary noun phrase for whatever explanation had been given in the previous paragraph. Such a phrase would often contain "the".

    [(myl) An excellent question. Among the logical possibilities are: a larger number of non-pronominal noun phrases; a higher percentage of definite articles on a similar number of noun phrases; a smaller number of one or more other parts of speech (e.g. adjectives, adverbs, discourse particles, etc.); and so on. Stay tuned.]

  6. Bloix said,

    January 9, 2015 @ 3:14 pm

    Perhaps "the" is less common in speech than in writing because we use more pronouns in speech. I find that people are more tolerant of potential ambiguity as to referents in speech than they are in writing. Pronouns have the advantage of saving time, which is more important in speech (a slow method of communication), so the trade-off between speed and potential ambiguity may favor pronouns in speech and nouns (and the accompanying article) in writing.

  7. Callum said,

    January 10, 2015 @ 11:13 am

    Is it not slightly loaded to claim that formality has decreased when we could interpret the same data as showing that the features that do and do not define formality have merely changed?

    On the face of it, the preponderance in formal contexts of features that have at some time been considered informal (e.g. not-contractions) is suggestive of a decreased formality, but could we not also explain this preponderance in terms of these features having become formality-neutral?

    This explanation might be preferred when taking a view of the totality of features that delineate formal styles because I think few people would argue that there remain some things that clearly distinguish formality from informality. It may just be that the focus has shifted from things like contractions to other types of words, or even from any kind of vocabulary prejudice to sentence structure.

    [(myl) This is certainly a good point. We could avoid it by backing off to a more general theory that features of informal (or spoken) language at time t0 tend in general to have leaked into formal (especially written) language at some later time t1. Leaving out the spoken/written dichotomy, this process is obviously one of the driving forces in language change.

    But I think that there is something else going on this case as well, though I can't prove it. Ted Underwood and Jordan Sellers ("The Emergence of Literary Diction", Journal of Digital Humanities 2012) argue that

    From the middle of the eighteenth century through the end of the nineteenth, poetry, fiction, and drama acquired a new diction that dramatized the difference between literary cultivation and mere specialized learning.

    In other words, a stylistic gap (here between a "literary" register and other writing) opened up and widened. I believe that you could make an analogous argument that over the past century or so, the stylistic gap between formal and informal genres of writing (and equivalently between writing and speech) has narrowed.

    A more extreme form of this claim was promoted by John McWhorter in his 2004 popular book Doing our own thing (NYT review here).]

  8. Bloix said,

    January 11, 2015 @ 11:50 am

    "the features that do and do not define formality have merely changed?"

    Formality is not merely convention. it's not possible to imagine a world in which flip-flops are more formal than highly polished black lace-up cap-toed oxfords.

  9. Adrian Hundhausen said,

    January 15, 2015 @ 2:55 pm

    English, as spoken by a native speaker not pretending anything, conveys a relatively high proportion of its meaning with verbs and adjectives. Other languages, such as Latin, French, and Italian (as well as German, I suppose), load more meaning onto nouns, in part because declined nouns (as in Latin or German) are more powerful than our nouns.

    Formal English prose of the 18th century imitated Latin and French prose in an attempt to sound learned, and thus favored nouns. Look, for example, at this sentence from Washington's address:

    "In resuming your consultations for the general good you can not but derive encouragement from the reflection that the measures of the last session have been as satisfactory to your constituents as the novelty and difficulty of the work allowed you to hope."

    "To derive encouragement from the reflection that…." is ridiculous because you are trying to derive an abstract noun (encouragement) from another abstract noun (reflection) using a very metaphorical verb (derive). This kind of pretension must have sounded learned at the time, whereas modern English dispenses with the pretension and writes (or at least should write) that "if we look at what bills were passed in the last session, we should feel encouraged." Look (verb) instead of the reflection (the + noun), and feel encouraged (ordinary verb + adjective) instead of derive encouragement (metaphorical verb + noun). And as the abstract nouns disappear, so do the definite articles. Thank God for that.

    Adrian Hundhausen

  10. Use of definite article shows ‘radical decline’ in last century, research shows | ATIF News - A voice for Florida T&I professionals said,

    January 16, 2015 @ 6:45 am

    […] Liberman speculates on his blog that one reason for the change could be “decreasing formality of style”, as writing becomes […]

  11. Adrian Wallwork said,

    January 17, 2015 @ 8:44 am

    The use of the definite article (and indefinite article) has declined because the general trend of the English language has always been towards using verbs rather than nouns whenever possible. Other languages, such as Italian, use 'the' far more frequently because they are noun-driven rather than verb-driven. For example, an Italian will say: We made the comparison of x and y, in order to do z. Whereas an English speaker might prefer: We compared x and y …

    This, in my opinion, has nothing to do with formality, but just with ease of writing / speaking, conciseness and clarity – features which English speakers frequently strive for but which are not innately part of the English language.

    In conclusion, the decline in the use of 'the' is yet another sign of how English is evolving and is in no way surprising.

  12. Adrian Wallwork said,

    January 17, 2015 @ 8:46 am

    Sorry, I have just realised that the Adrian before me wrote virtually the same thing!

RSS feed for comments on this post