Why definiteness is decreasing, part 2

« previous post | next post »

In an earlier post on this topic ("Why definiteness is decreasing, part 1"), I suggested that the decrease in definite-article frequency in published English text, over the course of the past century, might be connected with a decrease in formality.  Roughly, this means that writing has been becoming more like speech (though speech has also been changing, and writing and speech remain very different).

In this post, I want to discuss two other socio-stylistic dimensions — age and sex. If the language is changing, then we expect to see "age grading", where younger people tend to exhibit the innovative pattern, while older people's usage is more old-fashioned. And because women are generally the leaders in language change, we expect to see women at every age being more linguistically innovative and men being more conservative. In other words, "young men talk like old women".  And as the plot on the right illustrates, differences by age and sex in the frequency of the seem to confirm this hypothesis. (Click on the graph for a larger version.)

These numbers come from the Fisher corpus of conversational telephone speech, comprising nearly 12,000 10-minute conversations involving a similar number of callers. Here are the numbers in tabular form — frequency of the, as a percentage of all words produced by callers in the specified age range:

AGE <28 Age 28-40 Age >40
MALE 2.53%  2.72%  2.97%
FEMALE  2.31%  2.49%  2.62%

And trust me, the numbers are large enough that these differences are statistically significant.

So one interpretation of yesterday's theory about decreasing formality in written language is apparently wrong.  If this age grading in the spoken language reflects a change in progress, then we can reject the hypothesis that the changes in text are nothing but a gradual approximation to a fixed pattern in speech. Rather, the frequency of the is apparently decreasing in both text and speech — it's just that text is a lagging indicator.

Of course, this argument is an indirect one, since we don't have comparable speech samples from very widely separated time periods, and instead we're relying on age-grading to give us an "apparent time" picture. But as I noted yesterday, we do have two samples of conversational transcripts collected about a dozen years apart, which do show a change in the expected direction: the Switchboard corpus, collected in 1990-01, has overall the frequency of 2.98%, while Fisher, collected in 2003, has 2.47%.

In any case, we've already seen age- and sex-linked variation in the frequency, in a corpus of informal text. As I explained in "Sex, age, and pronouns on Facebook" (9/19/2014):

Andy Schwartz and others at the World Well-Being Project have worked with "Facebook posts from over 75,000 volunteers who also took the standard Interpersonal Personality Item Pool (IPIP) personality test to measure the 'Big Five' personality traits", looking for linguistic features that correlate with those aspects of personality measured by that test.

And in "More fun with Facebook: THE" (10/12/2014), I observed that

The script that I used to make that course assignment about Facebook pronouns ("Sex, age, and pronouns on Facebook", 9/19/2014; "More fun with Facebook pronouns", 9/27/2014) can trivially be focused on any other words — so here's "the":

(For comparison purposes, the y-axis frequencies of 20,000 to 30,000 per million in that graph are equivalent to 2.0-3.0 percent.)

It remains possible that the age- and sex-linked changes in the usage are a life cycle effect rather than evidence of overall linguistic change in progress — maybe older people just gradually get more formal (or at least more the-ful) both in speech and in writing.  But I'm guessing that this is mostly a linguistic change in progress.

We still don't know what these changes really are, in detail. What is taking the place of those missing definite articles? In response to a comment on yesterday's post, I listed some possibilities:

[A] larger number of non-pronominal noun phrases; a higher percentage of definite articles on a similar number of noun phrases; a smaller number of one or more other parts of speech (e.g. adjectives, adverbs, discourse particles, etc.); and so on.

And whatever the distributional shifts may be, are they semantically and rhetorically neutral, or do they reflect some larger stylistic shifts?  For an example of what such a shift might be like, consider what Pennebaker et al. have called the "categorical-dynamic index" (CDI), featured in a paper published just about a week ago: James Pennebaker, Cindy Chung, Joey Frazee, Gary Lavergne, and David Beaver, "When Small Words Foretell Academic Success: The Case of College Admissions Essays", PLoS one 12/31/2014.  The abstract:

The smallest and most commonly used words in English are pronouns, articles, and other function words. Almost invisible to the reader or writer, function words can reveal ways people think and approach topics. A computerized text analysis of over 50,000 college admissions essays from more than 25,000 entering students found a coherent dimension of language use based on eight standard function word categories. The dimension, which reflected the degree students used categorical versus dynamic language, was analyzed to track college grades over students' four years of college. Higher grades were associated with greater article and preposition use, indicating categorical language (i.e., references to complexly organized objects and concepts). Lower grades were associated with greater use of auxiliary verbs, pronouns, adverbs, conjunctions, and negations, indicating more dynamic language (i.e., personal narratives). The links between the categorical-dynamic index (CDI) and academic performance hint at the cognitive styles rewarded by higher education institutions.

More on this later.



  1. Victor Mair said,

    January 10, 2015 @ 9:08 am

    Referring to the last quoted paragraph, I recall the time a Dartmouth English professor told me that there is almost never a need to use the word "very" in one's writing. Ever after I have avoided "very" as much as possible, and have extended that to all the other (what I call) "empty adverbs". On the other hand, I have noticed to my astonishment that some younger colleagues, even those who were trained and are teaching at America's very best universities, will say, and sometimes even write, something like this: "He's a really, really, really outstanding poet."

    After I finish a paper or, if I have time, anything I write, I'll go through and strike out as much as possible of the empty verbiage that I can spot. Then I read through the piece again with satisfaction that it is much crisper and more precise than before I removed the useless verbiage.

    [(myl) Indeed, needless words are needless. But Pennebaker et al. don't suggest that either end of the categorical-dynamic index is better, nor should we interpret it that way. And as we'll see in a later post, the various components of their CDI show historical trends in various directions (and in some cases, no historical trends at all).]

  2. AB said,

    January 10, 2015 @ 11:36 am

    Does this mean that the Peevist tendency to blame disfavoured innovations on teenage girls might have some basis in fact?

  3. Victor Mair said,

    January 10, 2015 @ 11:55 am

    It's curious that, when we learn Mandarin, we are taught to say HEN3 DA4 ("very big"), HEN3 MEI3 ("very beautiful"), HEN3 HAO3 ("very good"), but at the same time better teachers will tell us to translate these expressions as "big", "beautiful", and "good".

  4. Bloix said,

    January 10, 2015 @ 11:58 am

    No one has yet mentioned the possible influence of Time-speak, the zippy journalese favored by Time magazine in its hey-day (parodied by the New Yorker's Walcott Gibbs as "Backward ran sentences until reeled the mind." One of Time-speak's characteristics was its omission of definite articles, for example in its use of epithets in place of parentheticals ("economist Galbraith" in place of "Galbraith, the economist").

    Geoff Nunberg wrote about Time-speak here,
    although he didn't mention its omission of articles.

  5. Brett Reynolds said,

    January 10, 2015 @ 3:52 pm

    You've no doubt noticed the precipitous drop of it over the 20th century, which, except perhaps for dummy uses, is also definite. Also, the demonstratives this, that, and their plurals.

  6. Akito said,

    January 10, 2015 @ 5:34 pm

    I was taught that Mandarin monosyllabic adjectives have a contrastive meaning if used without hen3. (Maybe polysyllabics too, but I'm not certain.) So, hen3 is not without its function.

    In English, you can say "Very well," but not just "Well," in response to "How are you?"

  7. J.W. Brewer said,

    January 10, 2015 @ 8:46 pm

    Somewhat parallel to vhm's story about how to translate Mandarin, I believe it is not uncommon when translating liturgical Latin texts into English to render "Sanctissima Trinitas" as merely "Holy Trinity" rather than "Most Holy Trinity," the justification being that Latin (at least in this context) has, as it were, a different and lower semantic-scope threshold than English for the superlative to be used and thus translating "sanctissima" as if it were merely "sancta" avoids creating a jarring effect in the resultant English.

  8. Levantine said,

    January 10, 2015 @ 10:34 pm

    Akito, I often say "I'm well" when asked how I am (though I agree that "Well" by itself would sound a little odd).

  9. Levantine said,

    January 10, 2015 @ 10:37 pm

    In keeping with J. W. Brewer's example, the Arabic "Allah[u] akbar" is usually (and best) translated as "God is great" rather than the more literal "God is greatest".

  10. Perry Dane said,

    January 11, 2015 @ 7:47 pm

    Have you tried to tease out how much of the change in the use of "the" merely reflects changes in grammatical usage, and how much reflects the underlying semantic content of these communications? For example, a President in a State of the Union message might say that "Congress should increase the minimum age" or that "The Congress should increase the minimum wage." The meaning is the same. In fact, there are obvious regional patterns here. Folks in Philadelphia are likely to say "City Council passed an ordinance today" while folks in New York are, I think, more likely to say "The City Council passed an ordinance today." Similarly, I've noticed that folks in Philadelphia are more likely to say, for example, "Rabbi gave a great sermon today" rather than "The Rabbi gave a great sermon today."

    So the pattern you're identifying might simply reflect a change in grammatical practices. It wouldn't even necessarily have anything to do with "informality."

    On the other hand, the drop in the frequency of the definite article might reflect real changes in the nature of discourse and the sorts of things we find ourselves talking about.

    Of course, some cases are ambiguous. Constitutional scholars, for example, still debate whether the use of the term "the people" in the Second Amendment (and the Fourth) suggests that the text is referring to a collective entity rather than declaring an individual right, or whether it's just a grammatical tic. (There certainly would be a difference in today's usage, I think, between saying, for example, "The people are smarter than you think" and "People are smarter than you think."

  11. Maneki Nekko said,

    January 12, 2015 @ 2:37 pm

    I have noticed that older people are more likely to call Facebook "the Facebook."

  12. Sean Pollock said,

    January 17, 2015 @ 10:03 am

    @Victor Mair That usage of HEN3 usually isn't literally translated as "very", it's more of a substitute for the copula "to be". I suspect it has to do with the ill-defined distinction between adjectives and verbs in Chinese.

RSS feed for comments on this post