Why definiteness is decreasing, part 1

I ended yesterday's post ("Decreasing Definiteness") with a promise to say more about why the frequency of the has decreased so much over the past century or so, and this morning's post will start to redeem that promise.

As several commenters observed, there are probably several different things going on here. But I think that one relevant factor is decreasing formality of style.

I'll leave for another day the question of what formality really is, and why a decrease in formality correlates with a decrease in the frequency of the. In this post, I'll try to establish two simpler points:

  1. In English text that's more formal, in common-sense terms, the is more common;
  2. The formality of (various genres of) English writing has been decreasing over the past century or so.

Decreasing definiteness

During the course of the 20th century, the frequency of the English definite article the decreased gradually and radically. I first noticed this effect about a year ago, in a post about the history of State of the Union addresses ("SOTU evolution", 1/26/2014), where I observed, in reference to the graph on the right, that

The average frequency of the in the most recent 10 SOTU addresses (2004-2013) was 47,458 per million words; in the first 10 addresses (1790-1799, all delivered as speeches to Congress) it was 93,201 per million words, almost double the frequency.  And the decline during the 20th-century era of oral addresses seems to have been a gradual one.

I speculated that

Maybe the style of speeches has been getting gradually less formal, and therefore gradually less like written style. Or maybe even formal styles have been changing.

And I noted that a corresponding effect can be seen in two other sources, the BYU Corpus of Historical American English (COHA) and the Google Books N-Gram viewer (GNG), though it is considerably smaller in magnitude:

COHA and the Google Books data pretty much agree, which is reassuring; and they both suggest a slight decline in the frequency of the; but the change that they show is very modest compared to the change in SOTU frequencies. So I feel that the explanation for the SOTU change remains to be found.

At that point, I turned my attention to other aspects of SOTU evolution. But a student paper recently reminded me of this issue.

Led astray by the corpus of memory: a response to Hendrik Hertzberg

The following is a guest post by Ammon Shea, a researcher for the Oxford English Dictionary's Reading Program and formerly a consulting editor for American Dictionaries for Oxford University Press.

Hendrik Hertzberg has made a series of claims recently on the New Yorker web site ("Nobody Said That Then!") about the ostensible inaccuracy of the language used in the television show Masters of Sex. His main contention is that many of the characters' utterances are improbable, asserting that certain words and phrases were not in use at the time that the show takes place (the mid-1950s). One of the problems with making bold and declarative statements about the origins of specific words is that these words have a nasty habit of first appearing much earlier or later than memory or intuition would attest.

Mechanisms for gradual language change

A few years ago, I wrote about a presentation by Bridget Jankowski on the trend towards increasing use of 's as opposed to of, in phrases like "the government's responsibility" vs. "the responsibility of the government". My post was "The genitive of lifeless things", 10/11/2009, and the slides from her talk are here.

I was reminded of this recently, while looking at usage changes in State of the Union messages over the centuries.  Apostrophe-s has seen a recent radical increase in SOTU frequency, reflecting in amplified form a more gradual increase in the English language as a whole. Such gradual, long-term trends are a puzzle: why and how do linguistic changes keep going for several centuries in the same direction, as they often do? You could ask the same question about other cultural changes, I guess, but for linguistic features that are preserved in the written form of a language with a textual history, like English, we have quantitative evidence over hundreds of years.

The American Dialect Society chose because as its Word Of The Year, and thereby provoked an argument, here and elsewhere, about parts of speech. Most dictionaries and grammars see words like for, in, since, etc. as variously prepositions, adverbs, conjunctions, or particles, depending on how they're used. Geoff Pullum argues that they're all always prepositions, just used in different ways. (See "Because syntax", 1/5/2014, and "The promiscuity of prepositions", 1/8/2014, for some of Geoff's reasons.)

It's worth pointing out that the complex patterning of these words in contemporary English is the outcome of an even more complex historical process.

Sentence diagramming

This is a guest post by Dick Hudson, who has promised a later submission about his experience helping to organize the re-introduction of grammatical analysis in the British school curriculum. This post gives some of his reflections on the pre-history of the grammarless state that he played a role in changing.

Okie uptalk

Several times over the past few years, I've speculated that American "uptalk", stereotypically associated with Californian "Valley Girls" in the 1980s, might in fact have originated with the characteristically rising intonational patterns of northern England, Scotland, and Ireland, by way of the Scots-Irish immigrants who migrated to California in the 1930s Dust Bowl exodus.  For example,

It seems plausible to me that "uptalk" in the U.S., Canada, New Zealand, and Australia represents the spread (or in some cases just the observation) of a pattern that's been normal in some regional varieties of English for a thousand years or more, originally representing the results of contact with Celtic and/or Scandinavian languages. In the U.S., the history might involve the people of Scots-Irish background who migrated to California during the Dust Bowl era in the 1930s, who formed a substantial part of the ethnic background of the "valley girl" stereotype.

There's a fair amount of evidence out there about how the "Okies" talked — so for this morning's Breakfast Experiment™ I thought I'd take a first look, starting with Alan Lomax's 1940 interview with Woody Guthrie, in which Woody reminisces about his boyhood in Okemah, Oklahoma.

Hezy Laing, "Examining Edenics, the Theory That English (and Every Other Language) Came From Hebrew", The Tablet 10/31/2013:

What if one day, instead of speaking hundreds of different languages, all of humanity suddenly began speaking the exact same language? More incredibly—what if we already do? A new movement called “Edenics” makes the claim that modern day English is simply a derivative of biblical Hebrew. In fact, the proponents of this theory say that all human languages are simply offshoots of Hebrew and claim to have thousands of examples to back them up.

Linguistic change on a short time scale

Comments (52)

American Passivity

This is an illustrative Breakfast Experiment™ for my course at the LSA Institute (on "Corpus-Based Linguistic Research"). It starts from an earlier LL post, "When men were men, and verbs were passive", 8/4/2006, where I observed that Winston Churchill, often cited as a model of forceful eloquence, used the passive voice for 30-50% of his verbs  in various passages from his 1899 memoir The River War — several times the rate noted in statistical usage studies from the 1960s and later.

So I thought I'd do a quick historical survey of passive-voice rates, as a example of what can be done with Mark Davies' COHA corpus.

Potosi miners' language

My roommate here at the LSA Institute is Pieter Muysken, and one of the many things that I've learned from him is that for 450 years or more, miners in Potosí (in what's now Bolivia) have communicated among themselves in a mixed language spoken only by mine-workers in connection with mining operations. Since the existing scholarly literature seems to contain just a few scattered references to this interesting phenomenon, I asked Pieter some questions about it, and I reproduce his answers below.

Read the rest of this entry »

Little doubt it wouldn't

Some time ago, R.I. sent in this quotation from Golf World, 6/3/2013:

Because Irwin is the oldest U.S. Open champion—45 when he defeated Mike Donald in a playoff at Medinah CC in 1990—and won his last PGA Tour event, the 1994 MCI Heritage, when he was 48, there seemed little doubt his skill set wouldn't make him a formidable senior-tour member if he committed to the 50-and-over circuit.

You probably think that this is going to be about misnegation, and the tendency for negative concord to sneak back into standard English after having been chased out a half a millennium ago. That's what I thought too, but along the way to YAMP (Yet Another Misnegation Post) I was waylaid by a curious observation.

Protesting too much

A guest post from Tony Kroch:

The line "The Lady doth protest too much, me thinks" from Hamlet that Mark Liberman blogged about at the end of last month struck me because it encapsulates in one sentence several significant changes that the English language has undergone. We are lucky that the written record is rich enough to let us see how features we take for granted today developed over time.

