The mystery of the decay

« previous post | next post »

Recent email from a colleague reminded of a series of posts documenting a general tendency for the relative frequency of the English word the to decline over the past couple of centuries:

"SOTU evolution", 1/26/2014
"Decreasing definiteness", 1/8/2015
"Why definiteness is decreasing, part 1", 1/9/2015
"Why definiteness is decreasing, part 2", 1/10/2015
"Why definiteness is decreasing, part 3", 1/18/2015
"Positivity?", 12/21/2015
"Normalizing", 12/31/2015
"The case of the disappearing determiners", 1/3/2016
"Dutch DE", 1/4/2016
"The determiner of the turtle is heard in our land", 1/7/2016
"Correlated lexicometrical decay", 1/9/2016
"Style or artefact or both?", 1/12/2016
"Geolexicography", 1/27/2016
"The accommodation", 3/14/2017
"Decreasing definiteness in crime novels", 1/21/2018

There are several reliable correlations with the frequency that plausibly played a role in this trend. The most obvious one is "formality", including the difference between writing and speech, as well as formality differences within written and spoken material. Similar effects are common in linguistic history, where changes start as colloquial variants before gradual acceptance into the standard language.

In the case of the, there are more specific correlated changes in the frequency of relevant constructions, such as the trend towards replacing the NOUN of X constructions with X's NOUN.  And the frequency has reliable correlations with age and gender. This might just be due to correlation with differences in formality/innovation, but perhaps there have also been secular increases in the proportion (and especially the influence) of young and female voices.

See the cited posts for details and documentation.

But in the end, the true causes of the the-frequency effect (as opposed to correlated effects of the causes) remain somewhat mysterious to me.

Ideas?

 



62 Comments

  1. Philip Taylor said,

    February 12, 2022 @ 10:45 am

    Could you possibly explain, Mark, what exactly you are comparing and contrasting in (a) "The most obvious one is "formality", including the difference between writing and speech" and (b) "as well as formality differences within written and spoken material. " ?

    [(myl) See the linked posts for copious examples and explanations.]

  2. Victor Mair said,

    February 12, 2022 @ 11:34 am

    When I correct papers by students from abroad, especially those from China, of whom I have many, the single most frequently misused word is "the" — when it should be inserted and when not. The hardest part of all is that I myself often cannot explain what the rule(s) for its usage are. Much of the time it's just an instinct or feeling that tells me that it should or should not be present.

    At least I feel confident in telling students that it's safe to use "the" when they're talking about a specific, definite item: "the pen on the table in the classroom".

    But watch what happens now: "the pen on the table in the classroom at the University of Pennsylvania"

    BUT "the pen on the table in the classroom at UPenn"

    NOT "the pen on the table in the classroom at the UPenn"

    [(myl) I've previously quoted "the joke about the first lecture of Russian class for English speakers: 'I start with good news! In English language, is necessary to use article! But in Russian language, no article!'". However, the patterns of article usage are oddly different (and hard to learn) across languages that have them, e.g. English and French. It's not surprising that it's hard for you to articulate the "rules" that you nevertheless intuitively follow, though of course that's true for pretty much every aspect of grammar.

    But all that is totally irrelevant to the topic of this post, which is NOT about mistakes in article usage, but about the fact that the frequency of the English definite article has been decreasing steadily for a century or two.]

  3. M. Paul Shore said,

    February 12, 2022 @ 11:35 am

    Philip Taylor: It seems clear to me that Prof. Liberman simply means that people are more likely to omit “the” in speech than in writing, more likely to omit it in informal writing than in formal writing, and more likely to omit it in informal speech than in formal speech.

    Having said that, I must admit that I’m having difficulty bringing to mind any real-life instances of this “the”-omission by native speakers that’s being discussed. When I think of omitting “the”, I immediately think of the defective English of nonnative speakers, especially Russians, as in Boris and Natasha importuning each other to “get moose and squirrel”.

    [(myl) It's NOT a matter of "omission", but rather the effects of differences in phrasing and word choice. And the resulting difference in frequency is not a small one — see e.g. this table (from "Why definiteness is decreasing, Part 1", linked in the post above):

    Spoken
    (SWB)
    Spoken
    (Fisher)
    Spoken
    (COCA)
    Fiction
    (COCA)
    Magazine
    (COCA)
    Newspaper
    (COCA)
    Academic
    (COCA)
    the  2.98%  2.47%  4.65%  5.27%  5.35%  5.34%  6.42%
    a/an  2.39%  2.04%  2.57%  2.43%  2.54%  2.76%  2.31%

    As noted in that post,

    • the is less common in speech than in writing;
    • the is much less common in informal speech (SWB and Fisher) than in formal speech – the COCA "spoken" genre is drawn from "All Things Considered (NPR), Newshour (PBS), Good Morning America (ABC), Today Show (NBC), 60 Minutes (CBS)", etc.;
    • the is most common in the most formal (here "Academic") writing.

    The distribution of a/an is both more even, and also less tied to formality.

    ]

  4. J.W. Brewer said,

    February 12, 2022 @ 11:53 am

    One observation I'll make is that there seem to be plenty of well-known minimal-pair contrasts where some speakers use a definite article and others don't, but the arthrous option does not (to my ear, at least) seem any more formal. Three examples:

    1. The Southern-California practice of referring to a highway like e.g. I-405 as "the 405" whereas in other parts of the U.S. we would never refer to e.g. I-684 as "the 684."

    2. The trans-Atlantic contrast of AmEng "in the hospital" versus BrEng "in hospital."

    3. The generational shift in AmEng where high school students of my generation went "to the prom" but these days our children go "to prom."

    For the first two it seems inherently implausible that Los Angelenos are using a more formal register than other Americans or that Americans are using a more formal register than Brits. For the last one I guess we could always complain that The Young People Today are abandoning formality, but I don't think that's the right analysis here. Although actually if someone wanted to do some corpus linguistics with a case-sensitive searching tool (is it just me, or did the google books n-gram get rid of its case-sensitive option?) it might be the case that there has been an evolution from "going to the [lowercase] prom" toward "going to [uppercase] Prom," with some prosodic emphasis in speech supplying the rough equivalent of capitalization in writing? My point here is that using a definite article is not the only way to mark definiteness, with capitalization or its oral equivalent being another potential such strategy. Thus, "proper nouns" are often easily understood as definite without needing such an article and sometimes fixed phrases can be converted into proper nouns or the functional equivalent.

    I'm not saying there aren't other examples where if you take a sentence and delete non-obligatory instances of "the" you will end up with a more informal feel, but I'm not sure how strong the correlation of the-absence and reduced formality is over the entire range of constructions that might or might not include a "the."

  5. Scott P. said,

    February 12, 2022 @ 12:13 pm

    For the first two it seems inherently implausible that Los Angelenos are using a more formal register than other Americans or that Americans are using a more formal register than Brits

    I find that Americans frequently use a more formal register than the British, stereotypes notwithstanding.

  6. Jerry Friedman said,

    February 12, 2022 @ 12:13 pm

    I tried a few words—magic, judgment, and religion—with and without "the" at Google Books. All had a peak in "the" use within a couple decades of 1900 and a visible decline since. Results here.

    A search with "a" instead of "the" found much less consistent results.

    Looking for those words at the beginnings of sentences (my original guess) doesn't show any pattern I can see.

    I don't know what to make of that except to suggest that it's not a shift in topics and I find it hard to imagine that it's a decrease in formality, since I can't think of situations where you'd put a "the" before those words in formal but not informal style.

    Sorry, I haven't checked all the posts you linked and all the comments to see whether anyone else has looked at specific words that way.

    J. W. Brewer: "Case-insensitive" is the default, but you can switch to case-sensitive.

  7. Jerry Friedman said,

    February 12, 2022 @ 12:20 pm

    Now I see MYL's response to M. Paul Shore, which makes me wonder whether it is formality after all. But what differences in phrasing between formal and informal styles could make a difference in the rate of "the" before the words I looked at? (Other than "the court's judgment" versus "the judgment of the court", etc., since that didn't appear to be the full explanation according to an earlier post.)

    [(myl) The claim is NOT that formality is the only reason for changes in definite-article frequency, just that it's one factor (of many) with a reliable overall correlation. Whether the relationship is causal, and if so, why, is open for discussion.]

  8. Antonio L. Banderas said,

    February 12, 2022 @ 12:39 pm

    @J.W. Brewer

    _The Young People Today_

    Why the capitalization?

  9. J.W. Brewer said,

    February 12, 2022 @ 12:41 pm

    Come to think of it, "in the hospital" may be an instance of a usage where "definiteness" in some sort of semantic sense is not actually what the "definite" article is doing. It would be perfectly idiomatic, for example, for one of my daughters to say when asked about her plans for an upcoming summer Saturday "my friends and I are going to the beach" but then to say "we haven't decided yet" if asked "which beach?" The initial sentence could well have said (and a prescriptivist busybody might well claim it should have said) "going to a beach," but that's not the idiomatic phrasing to use.

    So, again, "the" has a plethora of different uses, not all of which consistently indicate the presence of the same thing, whether that be "formality" or "definiteness" or what have you. Come to think of it, while I would agree that written English is on average likely to be more formal in register than spoken English, I don't know that it follows from that that any feature that is more common in writing than in speech is ipso facto "more formal." I think it's more complicated than that.

  10. Jerry Friedman said,

    February 12, 2022 @ 1:06 pm

    (myl) The claim is NOT that formality is the only reason for changes in definite-article frequency, just that it's one factor (of many) with a reliable overall correlation. Whether the relationship is causal, and if so, why, is open for discussion.

    How annoying, my comment on a href="https://books.google.com/ngrams/graph?content=the+religion%2Freligion%2Cthe+magic%2Fmagic%2Cthe+judgment%2Fjudgment&year_start=1800&year_end=2019&corpus=26&smoothing=3&direct_url=t1%3B%2C%28the%20religion%20/%20religion%29%3B%2Cc0%3B.t1%3B%2C%28the%20magic%20/%20magic%29%3B%2Cc0%3B.t1%3B%2C%28the%20judgment%20/%20judgment%29%3B%2Cc0">declines at Google ngram search in "the religion" versus "religion" and the same with "magic" and "judgment" doesn't seem to have appeared here. I also mentioned that I saw no temporal pattern in "religion", "magic", and "judgment" at the start of a sentence. That was the context of my question about formality.

    I wasn't wondering whether formality was the only factor; I was wondering how, with those specific comparisons, it could be a factor at all. What differences between formal and informal phrasing could make the difference between rates of "the religion" etc. in those registers? Especially if it's not a difference in possessives or in "a religion" etc. or in "Religion" etc. at the start of a sentence?

    I'm not doubting that formality has an effect on "the", of course. But with these specific examples, it's easier to think about what the phrasing might be, and for me, not to come up with anything. Is anything known? Is it some kind of nickel-and-diming, with a variety of phrasings without "the" being a little more common in informal registers?

    Just to make sure about possessives, here are some COHA results:

    religion:
    1900-1909: 1741

    2010-2019: 1098

    the religion:
    1900-1909: 110 (6.3%)

    2010-2019: 43 (3.9%)

    's religion:
    1900-1909: 13 (0.75%)

    2010-2019: 10 (0.91%)

    So obviously the increase in the percent with 's doesn't account for much of the decrease in "the religion" in that corpus.

  11. Jerry Friedman said,

    February 12, 2022 @ 1:08 pm

    Dang it. Though at least the URL will work even though I messed the link up.

    Now to grade some homework.

  12. Jeremy said,

    February 12, 2022 @ 1:09 pm

    @J.W. Brewer

    1. The Southern-California practice of referring to a highway like e.g. I-405 as "the 405" whereas in other parts of the U.S. we would never refer to e.g. I-684 as "the 684."

    The pre-interstate highway L.A. freeways originally had names: "The Hollywood Freeway," "The Pasadena Freeway," "The Harbor Freeway." When they later got assigned interstate highway numbers, the "the" stuck around.

  13. Philip Taylor said,

    February 12, 2022 @ 2:02 pm

    JWB — "The initial sentence could well have said (and a prescriptivist busybody might well claim it should have said) "going to a beach," but that's not the idiomatic phrasing to use" — not convinced (about the prescriptivist, that is). I am far to the right of the average prescriptivist, being more of a proscriptivist than anything, but I nonetheless recogise that in British English a child would normally say "going to the seaside" where an adult would say "going to the beach". When the child says "going to the seaside", he is not referring to any particular seaside (at least explicitly) but simply thinking of "the seaside" as the place where the sea meets the land (ideally a place covered in sand, with fascinating rockpools and the like). The adults uses "beach" in place of "seaside" simply because it sounds more adult, but his "beach" not more refers to any particular beach (explicitly) than does the child's "seaside".

  14. Philip Taylor said,

    February 12, 2022 @ 2:04 pm

    P.S. I now understand Mark's "compare and contrast" referred to in my original comment — Mark wrote "between" in (a) and "within" in (b), a most salient fact that somehow failed to impinge on my stream of consciousness,

  15. J.W. Brewer said,

    February 12, 2022 @ 2:17 pm

    @Jeremy: But why, e.g. "The Pasadena Freeway" rather than "Pasadena Freeway"? Perhaps there's a timing issue. Major highways of the pre-interstate era in the area I grew up in had anarthrous names like Baltimore Pike and Philadelphia Pike. But if you dig back enough far enough in the google books corpus you can find 19th-century instances of "the Baltimore Pike" and instances of "the Philadelphia Pike" as recently as circa 1920. But it's still "the New Jersey Turnpike," because that's a much younger name so there perhaps hasn't been enough time for the article to erode.

    I think there's a semi-predictable process for a lot of toponyms with transparently compositional names where over time they just get treated as a proper name whose etymology (and thus compositional semantics) is irrelevant to its meaning.* But this process does not reflect any change in either formality or definiteness. Of course, maybe new arthrous names are constantly arising to offset older ones eroding to anarthrousness?

    *Non-highway example – an online discussion a few years ago among alumni of varying ages established that the portion of the Yale campus universally known as anarthrous "Old Campus" by the time I got there in the 1980's was more commonly thought of as "the Old Campus" by those who had been enrolled prior to 1970, with those who had been around in the mid-Seventies having inconsistent recollections of usage that confirmed that they had lived through a transition. The period during which the "Old" was genuinely descriptive, because the other buildings those were being contrasted with were so recently-constructed that the fact that they were "new" was actually salient ended by probably the end of WW2, but the definite article that had been part of the transparently-compositional meaning lingered for another few decades before eroding away into idiomaticity.

  16. Terry K. said,

    February 12, 2022 @ 3:03 pm

    The thought I had when I read Victor Mair's comment, related to UPenn vs the University of Pennsylvania, might apply to freeway names as well. Seems to me, with proper nouns, when we think of them as descriptive, we tend to use "the"; when we think of them as just a name, but not a description, we don't. "University of [Place]" lends itself to thinking of it as a description, so add a "the". Abbreviated forms don't, so no "the". I suppose "[Place] University" can be thought of either way, and thus varies.

    If the Pasadena Freeway runs through or to/from Pasadena, then the name is descriptive, thus "the".

    Related, music groups and sport teams with plural names, "the Panthers" because we think of it as each member is a panther, so it treated as descriptive.

    Of course, that doesn't mean it's always that simple. And idiom comes into play.

  17. Terry K. said,

    February 12, 2022 @ 3:08 pm

    P.S. And J.W. Brewer just above nicely talks about another factor that comes into play sometimes in these sorts of names.

  18. John Swindle said,

    February 12, 2022 @ 4:52 pm

    Is the interplay of different national varieties of English (British, American, Indian, Nigerian, Filipino, Dutch, etc.) somehow involved in the decline? Contact between them could be increasing.

  19. DaveK said,

    February 12, 2022 @ 5:17 pm

    No one seems to have mentioned texting as a factor in the decline of articles. Texting tends to be telegraphic and filled with abbreviations, and a lot of times I’ve seen articles omitted in the interest of saving time and space.

    [(myl) Texting is relatively recent, and didn't start to take off in the U.S. until 2008 or so, a decade after it was widespread in Japan and elsewhere — see the discussion links in "Social Change" (1/15/2015) for some perspective. The steep decline in the frequency began more than a century before that.

    And again, the change under discussion is NOT omission of words under the pressure of typing time (or for any other reason) — it's choices of wording and phrasing that result in changes in the relative frequency of certain words.]

  20. Jerry Friedman said,

    February 12, 2022 @ 5:37 pm

    I repeated MYL's ngram searches (in "Correlated lexicometrical decay") for "the _NOUN_" as a fraction of "_NOUN_" and "the _ADJ_ _NOUN_" as a fraction of "_ADJ_ _NOUN_". The steady decline he found has reversed in the years that have become searchable since he did his search—there's a remarkably parallel upward trend after the early 1990s, though well short of what would get back to 1900 levels.

    What could explain that?

    By the way, I also did some more individual nouns, namely the allegedly four most common nouns in English, and got more varied results. Specifically, "the year" has declined a lot, "the time" has declined much like the words I looked at before, "the person" has fallen and then risen almost back to its 1900 level, and "the way" has increased quite a bit.

    I think explanations are going to involve such words as these much more than tiny effects such as "in (the) hospital", "going to (the) prom", and "the 10 / (I-)10".

  21. Jerry Friedman said,

    February 12, 2022 @ 5:41 pm

    Still not getting my links right. Here are the results for "the noun" and "the adjective noun".

    DaveK: I don't think any of the corpora searched include text messages.

    Jeremy: But why didn't that happen anywhere else? I-90 in Cleveland was the Shoreway (officially the Memorial Shoreway) before it was I-90, but the "the" didn't stick around. I think the same story has been repeated all over the country.

  22. Jerry Friedman said,

    February 12, 2022 @ 5:48 pm

    I think the results I mentioned in my comment above are quite interesting; the trend has reversed at Google ngrams since MYL did his searches. Both "the _NOUN_" and "the _ADJ_ _NOUN_ are going back up in remarkably correlated fashion, though well short of their 1900 levels. What could cause that?

    Why are my comments disappearing? Maybe because of my HTML errors?

    Anyway, I'm going to try again. Here are the results for the four allegedly most common nouns in English. For "year" and "time", "the" decreases, for "person" it falls and goes back almost to its original level, and for "way" it rises.

    I think words like those will be much more involved in the explanation of "the" decay than "go to (the) prom", "in (the) hospital", and "the 10 / (I-)-10".

  23. Jerry Friedman said,

    February 12, 2022 @ 5:50 pm

    And now my earlier vanished comment has reappeared. I'm sorry if I'm causing trouble. If anyone is intervening, there's no need to rescue the second one that disappeared, within the last ten minutes or so.

  24. David L said,

    February 12, 2022 @ 6:19 pm

    The pre-interstate highway L.A. freeways originally had names: "The Hollywood Freeway," "The Pasadena Freeway," "The Harbor Freeway."

    By way of a counterexample, the John Shirley Highway that runs SW from the Potomac to the Washington Beltway is now known as I-395. Older residents will still sometimes call it the Shirley Highway, but no one says the 395.

    [(myl) And people still call the Schuylkill Expressway the Schuylkill Expressway, as well as I-76 — but nobody calls it "the I-76", at least in my experience. ]

  25. Antonio L. Banderas said,

    February 12, 2022 @ 6:22 pm

    @Phillip
    "his "beach" not more refers to"

    Shouldn't it be either "no more" or "does not anymore" ?

  26. Antonio L. Banderas said,

    February 12, 2022 @ 6:23 pm

    @Phillip
    "his "beach" not more refers to"

    Shouldn't it be either "no more" or "does not anymore" ?
    https://en.wiktionary.org/wiki/no_more#Adverb

  27. Philip Taylor said,

    February 12, 2022 @ 8:02 pm

    Yes, "no more" was intended but I made a typo. (not unusual for me) and failed to recognise it as such until it was too late. I too would really appreciate the ability to edit a comment for a finite period after posting.

  28. Rick Rubenstein said,

    February 12, 2022 @ 9:20 pm

    British post-punk band the The released their last studio album in 2000. Case closed.

  29. Vulcan with a Mullet said,

    February 12, 2022 @ 9:35 pm

    This is nothing other than a rough guess by a layman, but I have a feeling it might have to do with the tendency to use other determiners during speech (like possessives, specific numbers, determinatives) since the nature of spontaneous conversation invites those kinds of constructions that specify rather than generalize? As in "Mmm, I love these pancakes" instead of "I love the pancakes" ?

  30. Jerry Friedman said,

    February 12, 2022 @ 10:54 pm

    Vulcan with a Mullet: Here's another ngram search Looks as if demonstratives and "my" and "your" have the same trajectory as "the": steadily down, and then a comeback in the last couple of decades.

  31. Jerry Friedman said,

    February 12, 2022 @ 11:21 pm

    I'm thinking the comebacks are caused by changes in the corpus.

  32. Chas Belov said,

    February 13, 2022 @ 12:33 am

    I haven't noticed myself using "the" any less. Nor have I noticed "missing" the's in my younger co-workers writings. It is fairly idiomatic.

    The Central Freeway (SF) is backed up.
    Take Parkway West (Pittsburgh).

    "In the hospital" seems odd upon analysis, since which hospital is only relevant if you want to visit the person. I would have expected "in a hospital" But do say "in the hospital".

  33. John Swindle said,

    February 13, 2022 @ 1:41 am

    Versus "in prison," "in school," "in jail." Again, though, the question is less why definite article usage is odd than why it declined (and maybe rose again).

  34. Bob Ladd said,

    February 13, 2022 @ 2:56 am

    I very much doubt that cases like in (the) hospital and (the) I-405 have anything to do with the phenomenon MYL is trying to discuss in his post.
    – Many European languages that have definite articles also have item-specific collocations of prepositions+noun in which an article might be expected on semantic grounds but is omitted. Because of dialect variation, we notice differences like Brit in hospital vs Am in the hospital, but there are quite a few others (in school, in church, in court, in jail, etc.). In all these cases, in English, there's a clear sense of not referring to a specific school, church, etc., but rather something like the institution, so in hospital makes sense, but the dialect difference shows that it's also a function of specific nouns. Compare also in prison, in jail with in the hoosegow, in the slammer, and note that in those cases the formality explanation makes exactly the wrong prediction, reinforcing MYL's point that this is NOT about "omitting articles". And finally, comparing across European languages, Italian has a very similar usage but it extends to more nouns, so you can say e.g. in ufficio ('in office') and in bagno ('in bathroom'), while in Romanian în is one of several very common prepositions with which definite articles are essentially never used.
    – Toponyms also vary across languages and across dialects in the same way that has nothing to do with formality. British highway numbers, like California highways numbers, are always preceded bythe, and at least some Canadian ones (the 401) are as well. Mountains rarely take the in English (the Matterhorn and a few other Swiss Alps are notable exceptions) but in Italian all the local volcanoes (Etna, Vesuvius, etc.) have to have an article. Again, this is basically about the lexicon, not the grammar, and certainly not about formality.

  35. cliff arroyo said,

    February 13, 2022 @ 3:43 am

    Just a thought… in many European languages with articles, both abstract nouns and generics (maybe not the right word) tend to take the definite article

    Spanish examples:
    Me encanta la naturaleza. ~~ I love nature.

    El perro es un mamífero. ~~ the dog (any dog) is a mammal though 'Dogs are mammals.' is probably more natural now.

    Off the top of my head the second type of example used to be common in English and the first wasn't completely unknown either. I don't know if people would classify the modern forms as 'less formal' or if changing standards of formality play a part.

  36. Richard Belaire said,

    February 13, 2022 @ 8:19 am

    In a previous life I spent quite a bit of time editing technical papers and reports. I found myself removing innumerable instances of the word "the" simply to improve readability of said document. With so many uses of "the" scattered about one quickly lost track of important details while trying to navigate grammar. Simpler (less formal) phrasing made the reports more understandable IMO.

  37. cliff arroyo said,

    February 13, 2022 @ 8:49 am

    " I found myself removing innumerable instances of the word "the" simply to improve readability of said document"

    Try going the other direction…. I was once asked to edit a paper in geography (translated by the author himself from the original Polish ).
    It was about six pages and no more than an article or two per page… it was…. really rough going, and took at least three times as long as similar work usually did) and my subsequent advice to Polish authors is to err on the side of articles… it might be clumsy and non-idiomatic but it's still easier to read than article-less English.

  38. bks said,

    February 13, 2022 @ 9:03 am

    @P.Taylor Observing LLogophiles deal with their commenting mistakes is half the fun of LLog.

  39. Jerry Packard said,

    February 13, 2022 @ 9:27 am

    Function (closed-class) words usually come from content (open-class) words, and go on to disappear with surprising frequency over the passage of time. 'The' is a function word par excellence, and so might be expected to become less frequent over time, especially since its function is quite predictable: whether information is 'definite', 'given' or 'old' is usually easily derivable from context.

  40. Jerry Friedman said,

    February 13, 2022 @ 10:42 am

    Back to those strange comebacks. It's different in the COHA data. "The" before the nine most common nouns does seem to have decreased since 1900 and leveled off in the past three decades, but not gone back up. On the other hand, "my|your" before those nouns has risen quite a bit since 1900, unlike the trend at Google ngram search, and then leveled off. Anyway, I suspect the rises in recent decades at ngram search are artifacts of changes in the corpus.

    time|person|year|way|day|thing|man|world|life

    1900-1909: 173,012

    1990-1999: 246,932

    2000-209: 262,358

    2010:2019: 255,255

    the time|person|year|way|day|thing|man|world|life

    1900-1909: 34,972 (20.2%)

    1990-1999: 46,752 (18.9%)

    2000-2009: 50,480 (19.2%)

    2010-2019: 49,136 (19.2%)

    my|your time|person|year|way|day|thing|man|world|life
    1900-1909: 3,192 (1.84%)

    1990-1999: 7552 (3.05%)

    2000-2009: 8254 (3.15%)

    2010-2019: 7754 (3.03%)

  41. Jerry Friedman said,

    February 13, 2022 @ 11:07 am

    Richard Belaire: I'm curious to see examples of the kinds of changes you made where simpler or less formal phrasing reduced the incidence of "the".

    Jerry Packard: I'm glad to learn that about function words, but I still wonder what's going on with nouns instead of getting "the" attached. And is the rate of decay of "the" normal, or nmaybe unusually fast?

  42. Doug said,

    February 13, 2022 @ 1:07 pm

    @Jerry Packard:
    "'The' is a function word par excellence, and so might be expected to become less frequent over time…"

    The opposite development is also attested. A definite article can become more frequent over time and end up as a marker on nearly all nouns regardless of definiteness.

  43. Richard Belaire said,

    February 13, 2022 @ 1:41 pm

    To Jerry Friedman: Here is an example of what I meant:

    Original : "Looking at the test results in Table 1, the brake-specific fuel consumption generally decreased as the engine output torque increased."

    Changed : "Looking at test results in Table 1, brake-specific fuel consumption generally decreased as engine output torque increased."

  44. Jerry Friedman said,

    February 13, 2022 @ 2:06 pm

    Richard Belaire: Thanks. That raises the possibility that some of the decline in "the" may be due to editors.

    I might have taken out the second and third "the"s, but I'd have left the first one in. Maybe I'm old-fashioned.

  45. Bob Ladd said,

    February 13, 2022 @ 2:15 pm

    I agree with Jerry Friedman that Richard Belaire may be onto something. The version with the occurrences of the edited out does sound somehow punchier and, perhaps, more contemporary. I can't otherwise put my finger on the difference between the two versions (and, like Jerry Friedman, I would probably have let the first the stand), but it seems plausible that such stylistic changes might be part of the source of the "decay" of the the the OP is about.

  46. David Morris said,

    February 13, 2022 @ 2:33 pm

    If it wasn't the Chinese students using 'the' too little, it was the South American students using it too much. On average, they were 100% correct.

  47. David Marjanović said,

    February 13, 2022 @ 5:18 pm

    English has an unusually large number of fixed phrases that don't allow articles, it seems to me.

    I'm not sure if that number is increasing. But the number of designations treated as proper names, and therefore losing their articles, is definitely increasing: the Congress > Congress comes to mind, likely assisted by in Congress assembled.

    On the flip side are all those American universities with articles in their names: The University of…

  48. Victor Mair said,

    February 13, 2022 @ 8:21 pm

    Going back to my initial comment (second in the thread), I think that it is relevant, and that its relevance is borne out by many of the subsequent comments.

    If native speakers of English are often uncertain about when and whether to use "the", there would likely be a tendency to dispense with it in such cases, especially on the part of skittish editors.

  49. J.W. Brewer said,

    February 14, 2022 @ 11:35 am

    Yet another small phenomenon contributing to the overall anarthrous trend is mentioned in this New Yorker piece on the rise of "Multicultural London English." It is reportedly now idiomatic in M.L.E. to say "I went pub last night" rather than a more traditional locution like "I went down the pub last night." But I remain curious as to whether there are offsetting micro-evolutions in idiomatic phrasing elsewhere in English usage that give rise to definite articles where none would have been used in earlier generations, even if on a net basis the declining usage sectors outweigh them.

    https://web.archive.org/web/20220206111339/https://www.newyorker.com/culture/personal-history/the-common-tongue-of-twenty-first-century-london

    Separately, I was reminded yesterday of how arbitrary and language-specific article usage can be, because it was the Sunday (on my particular church's lectionary cycle) where we heard the gospel lesson of the Pharisee and Publican. The Publican famously says, using two definite articles in the Greek, ὁ Θεός, ἱλάσθητί μοι τῷ ἁμαρτωλῷ, which is traditionally translated (e.g. in the King James Version) with zero definite articles but one indefinite article as "God be merciful to me a sinner." And that seems the correct approach if you want even modestly idiomatic English, even though the KJV is comparatively "literalistic" as such things go. On the other hand given the plethora of rival translations that have been done since the mid-Nineteenth century. there are unsurprisingly a minority of versions that have opted to render the second article in the Greek more literally, e.g. "God be propitious to me — the sinner" or "God, be merciful to me, the sinner." There are yet others, however, that phrase the sentence to make it completely anarthrous, e.g. "God, be merciful to me, sinner that I am," which indeed is what Wycliffe did in the 14th century ("God be merciful to me, synnere"), although it may be significant that Wycliffe was translating from the anarthrous Latin Vulgate instead of the over-arthrous Greek original.

    Notably, however, no English translation seems to have tried to explicitly represent the Greek definite article in the initial vocative phrase, because "The God" is too grossly unidiomatic a way to address God (or a god) directly in English.

  50. Rachael Churchill said,

    February 14, 2022 @ 12:55 pm

    Richard Belaire, I'm astonished by that example. I edit papers written by non-native speakers, and if I received your "changed" example I'd put all the "the"s back in, especially the first. "Looking at test results in Table 1" is not grammatical for me at all – it reads like Russian English, or some kind of abbreviated register like headline, telegram, or txtspk.

    Would you say "Looking at graph in Figure 1" or "Looking at standard deviation of these readings"? If not, what's the relevant difference?

    (Could it be a US/UK difference? I'm in the UK.)

  51. Andy Stow said,

    February 14, 2022 @ 3:05 pm

    I actually work testing things like brake specific fuel consumption. I like the removal of the second two instances of "the," but I'd probably rewrite the beginning.

    The test results in Table 1 show that brake-specific fuel consumption generally decreased as engine output torque increased.

    What purpose do the words "looking at" show? I already know that I can only see what the table shows by looking at it.

  52. J.W. Brewer said,

    February 14, 2022 @ 3:34 pm

    Count me as another vote for the emerging (non-UK?) consensus that the omission of the first "the" in Richard Belaire's example yields an unidiomatic-sounding result but the omission of the other two instances of "the" sounds fine. And even Rachael Churchill, who balks at the other omissions, seems to find the first the most grievous. Maybe the more general point is that the "omit needless words" style of editing is hard to automate because the perceived "needfulness" of a potentially superfluous word is not strictly a matter of semantics and logic but of idiomaticity, which is complex and nuanced and maybe hard to model.

  53. Philip Taylor said,

    February 15, 2022 @ 3:44 am

    As a Briton, I too could not do without the first "the" and am ambivalent considering the need for the second two. But were I copy-editing, I would leave them in. Had I been the author, I would probably have written something like ""Looking at the test results shewn in Table 1, it can be seen that, in general, brake-specific fuel consumption decreased as engine output torque increased".

  54. Rachael Churchill said,

    February 15, 2022 @ 4:37 am

    I agree with Andy Stow: "The test results In Table 1 show…" is better. But it would still have to be "The test results" and not "Test results".

  55. Dara Connolly said,

    February 15, 2022 @ 3:02 pm

    Another minor transatlantic difference I've noticed in the use of "the" is Americans referring to (e.g.) Dublin Airport as "the Dublin Airport".

  56. Philip Anderson said,

    February 15, 2022 @ 5:11 pm

    I don’t actually have a problem with “looking at test results”, although I would probably choose to include a ’the’; this is a plural, so quite different from Rachel’s rejected “looking at graph”, and plurals often don’t need an article where the/a singular does.

    Re ὁ Θεός, all Greek proper names, including Jesus, included an article that is never translated. I remember when Fu Man Chu threatened to destroy the Windsor Castle – the Government posted soldiers all around Windsor Castle, he torpedoed the liner.

  57. Terry K. said,

    February 15, 2022 @ 5:15 pm

    If "the Dublin Airport" is in speech, not writing, I'm guessing the Americans are actually saying "the Dublin airport". That is, they don't know the name of the airport, they just know it's the airport in Dublin. It's equivalent to "the London airport". Or, for my local airport, Kansas City International Airport, it's like instead saying "the Kansas City airport". And airport being so straightforwardly named after the city it's in and/or serves is not something we expect. (I just looked up Newark, often called just that, and I see it's actually in full Newark Liberty International Airport.)

  58. Philip Anderson said,

    February 16, 2022 @ 3:16 am

    @Terry K
    I would just say the airport for Dublin or Dublin’s airport. The X airport sounds like a name rather than a descriptor.

  59. Philip Taylor said,

    February 16, 2022 @ 6:44 am

    I think that I (a Briton) could easily imagine myself saying (for example) "Sài Gòn airport" if I could not remember that it is really called the Tân Sơn Nhất International Airport, but it would be unnatural for me to prefix it with "the".

  60. Rachael said,

    February 16, 2022 @ 10:02 am

    @Philip Anderson, I would of course be fine with "looking at test results" in a context like "I've spent all day looking at test results". But for "looking at [the] test results in Table 1", it's some specific test results, so it needs "the" for me.

  61. Philip Anderson said,

    February 16, 2022 @ 5:13 pm

    @Rachael

  62. Philip Anderson said,

    February 16, 2022 @ 5:19 pm

    @Rachael
    (Take two)
    I prefer it with ‘the’, but it doesn’t need it for me. It makes sense for me without.

RSS feed for comments on this post