SOTU evolution

« previous post | next post »

In preparation for Tuesday's State of the Union address, I thought I'd take a look at the language of these addresses over the years. Texts are available at UCSB's American Presidency Project — I downloaded their texts and removed irrelevant mark-up .(Or rather, I wrote scripts to do all of this automatically — I believe that the results are generally correct but there are probably a few uncaught errors.)

There are lots of ways to approach this question. In today's post, I'll set the stage and look at a couple of simple word-frequency features, with more (and maybe more interesting) explorations to come later on.

When we see a change in the SOTU messages over the years — and there are plenty of changes to see — we need to consider several different sorts of causes. Maybe the English language itself has changed; maybe style or fashion has changed, at least for a certain sort of political language; maybe the themes or topics of the addresses have shifted, due to changes in the world or at least in the American political landscape; or maybe the individual styles of particular presidents (or their speechwriters) are what's at stake.

No doubt all of these things apply to different degrees in different cases, but we also need to keep in mind something explained at length in Gerhard Peters' "Research Notes" on State of the Union Addresses and Messages, which is that between 1801 and 1913, the SOTU was "a written (and often lengthy) report sent to Congress to coincide with a new Session of Congress"). Here's a graph showing the time-periods involved and the consequences for message length:

We've looked at linguistic SOTU trends a couple of times in the past. For example, in "Real Trends in Word and Sentence Length", 10/31/2011, I used the SOTU texts as one source for evidence about how sentence lengths have been getting shorter over the past couple of centuries:

(In the plots above, the red lines track the address-by-address measurements as my scripts calculated them, while the blue lines are smoothed approximations produced bylocally-weighted scatterplot smoothing in R.)

There may  be some indication of the switch from written reports to oral addresses in the early 20th century, but overall, the secular trend remains clear, and is not remarkably different from the pattern seen in the Inaugurals, all of which (I believe) were speeches delivered orally. And  the trend towards shorter sentences is surely a culture-wide stylistic trend, which is mirrored in the SOTU and Inaugural texts. At least in part, the shortening of sentences is the reflex of a more paratactic style, with less clausal embedding, as discussed in "Inaugural Embedding", 9/9/2005, and "Presidential Parataxis", 1/24/2009.

These results are not surprising. It's a comonplace observation that English prose style has been moving, over the past couple of centuries, towards shorter and simpler sentences — and sometimes, commonplace observations are actually true.  But here's a trend whose explanation is less obvious:

The written SOTU reports apparently have a higher the frequency, but the written/oral distinction can't explain the whole thing. The average frequency of the in the most recent 10 SOTU addresses (2004-2013) was 47,458 per million words; in the first 10 addresses (1790-1799, all delivered as speeches to Congress) it was 93,201 per million words, almost double the frequency.  And the decline during the 20th-century era of oral addresses seems to have been a gradual one.

Why is this? Maybe the style of speeches has been getting gradually less formal, and therefore gradually less like written style. Or maybe even formal styles have been changing. We can add to the plot the comparable data from COHA (by decade) and from the Google Books Ngram collection (by year):

COHA and the Google Books data pretty much agree, which is reassuring; and they both suggest a slight decline in the frequency of the; but the change that they show is very modest compared to the change in SOTU frequencies. So I feel that the explanation for the SOTU change remains to be found.

Here's an even more striking stylistic change:

In this case the proportional change is much greater. The frequency of which in the most recent 10 SOTU addresses (2004-2013) was 742 per million words; in the first 10 addresses it was 12,272 per million words, more than 16 times greater.  And again, the changes seem to have been relatively gradual ones, with a decline from 1810 to 1850 or so, a rise for a few years around 1900, and then a long fall through the modern era.

Is this a response to grammar mavens' which-hunting? Or is it an underlying stylistic trend, with which-hunting merely a symptom? Or both?

If we add data from COHA and Google Books, we again find a trend in a similar direction but much weaker in size:

So again, there's something left to explain here.

What about examples of thematic differences? Here are two cases of semi-complementary concepts, with word frequencies as a (no doubt imperfect) proxy. For these cases, since the overall frequencies are lower, I've switched to averages by decade. First, nation vs. states:

Some amount of the this change may be due to swapping "America" for "United States" — anyhow, more investigation is needed.

For a second example, freedom vs. duty:

We might ask again to what extent these changes reflect broader trends in cultural emphasis (or at least word frequency as a perhaps-faulty proxy for it). And indeed, the Google Books Ngram frequency for duty/duties does fall during this period, and the frequency of freedom/freedoms does rise:

But again, if we plot the changes on the same scale as the SOTU frequencies, there is a large difference in the size of the effects:


A plausible interpretation of this last plot is that duty/duties returned to background rates in SOTU messages during the second half of the 20th century, while freedom/freedoms was at background rates up until that point, with the deviations from background rates representing the influence of the political rhetoric of the time.


  1. Mary Apodaca said,

    January 26, 2014 @ 1:34 pm

    Any instances of "libido?"

    [(myl) Not so far. No "lust" either. But a certain amount of (presumably innocent and even laudable) "desire":
    371 desire
    108 desired
    35 desires


  2. Gregory Kusnick said,

    January 26, 2014 @ 2:00 pm

    Any evidence that "America" is displacing "the United States"? That could account for (at least some of) the decline in "the" and "states".

    [(myl) It's certainly true that in the SOTU messages, america has come up as the united states has gone down (or rather, america has come up about a half-century after the decline of the united states):

    But the effect is barely a pimple on the butt of the decline in the frequency of the:

    So we'll have to look elsewhere for the explanation of the decline in use of the; and we're not going to find it in a decline in the frequency of some particular "the X" phrase. There aren't any that are common enough to start with… ]

  3. Geoff Nunberg said,

    January 26, 2014 @ 3:05 pm

    You probably want to look at "freedom" vs "liberty." I wrote about this shift in usage here.

    [(myl) Nothing very striking happens to liberty in the SOTU messages:

    The effect on freedom, as shown above, is similar in direction to the Google Books change, but much larger in magnitude.]

    GN: Not easy to tell from this how it falls out on partisan lines, but it looks as if the modern spikes in freedom from are from Republicans from Reagan on, whereas the big spikes in the 40s are from FDR, who introduced the expanded sense of freedom (e.g., "from want," "from fear") into American political discourse. Is that right?

  4. J.W. Brewer said,

    January 26, 2014 @ 4:23 pm

    "Duty" also has a sense of "money paid in connection with the importation of goods; tariff." Reviewing President Grant's SOTU addresses (which some helpful person on the internet had stuck into a single pdf), it looked on a quick skim like on the order of 20 or 25% of the hits for duty/duties were for that sense. The introduction of the income tax in the early 20th century eventually made such "duties" a smaller piece of federal government finances and thereby perhaps less SOTU-worthy for discussion. (Most but not all of Grant's remaining hits seemed to be referring to his own duties to the public or those of other government officials, not the duties as opposed to freedoms of the general public.)

    [(myl) Indeed. It would be nice to be able to do reliable automatic sense disambiguation on this material — or rather, semantic analysis to the point where we could distinguish not only customs duties from civic duties, but also whose duties (or rights or freedoms) were being referenced.]

  5. D.O. said,

    January 26, 2014 @ 5:46 pm

    I have a theory about "the". What if about 1900 or maybe 1913 presidential speechwriters were under the influence of 2 conflicting impulses. One, to produce a text which is more like a speech and a second one, to write a text which is in the rhetorical tradition of previous SOTU speeches. The question remains though why initial SOTU addresses are so formal.

  6. Mara K said,

    January 26, 2014 @ 5:55 pm

    Can you compare the frequency of "which" to the frequency of "that" in the same syntactic position?

  7. GeorgeW said,

    January 26, 2014 @ 7:08 pm

    The 'which' decline may be related to the shortening of sentences, i.e., fewer relative clauses.

  8. Rubrick said,

    January 26, 2014 @ 7:28 pm

    I find the marked decline in "the", either in Presidential speeches or out of them, rather astonishing. It's not as though the word has a lot of alternatives. If asked to reduce my own "the" usage, I think I'd find it quite a challenge.

    [(myl) If you wrote in the style of the early 19th century (and perhaps you do, in which case accept my apologies), you might find it easier. Consider these phrases from James Monroe's 1821 SOTU: "In the concerns which are exclusively internal there is good cause to be satisfied with the result", which might have been something like "Internal affairs are going well"; or "The receipts into the Treasury from the first of January to the 30th of September last have amounted to $16,219,197.70", which might have been "Treasury receipts from January first to September 30th …"]

  9. Chrisj said,

    January 26, 2014 @ 7:33 pm

    With respect to "Freedom", my immediate suspicion is that it might be significant that the big spike in uses of the word in SOTU addresses runs (roughly) from 1940 to 1990. That is, it started around the beginning of WW2, and lasted until the collapse of the USSR; might it be related to rhetoric about "defending freedom"? (If you'd asked me before today, I'd wouldn't have said there'd been much decrease in such talk, but now I'm wondering if that's just my memory. Maybe the "War on Terror" has driven it out more than I'd realized.)
    [(Geoff N) "Freedom" really lived three lives over that period, first in connection with the "Four Freedoms" that FDR introduced in a 1940 speech, as well as in "the free world," which up to the end of WWII denoted the allies; then in connection with the Cold War, as you suggest, in opposition to communism, and finally in Reagan's paeans to "economic freedom," which for him denoted purely an absence of government intervention in the market ("we'll make this economy a mighty engine of freedom, hope, and prosperity again," he said; contrast FDR's understanding of the word as personal security: ""our determination to achieve an economic freedom for the average man which will give his political freedom reality'). These are just the most recent of a continual line of redefintions of the word; as Eric Foner puts it in the Story of American Freedom; the word is “deeply embedded in the record of our history and the language of everyday life” and is “fundamental to Americans’ sense of themselves.”

  10. D.O. said,

    January 26, 2014 @ 10:31 pm

    I've checked two SCOTUS opinions. One is Chisholm v. Georgia, 2 U.S. 419 (1793), which Wikipedia says "is considered the first United States Supreme Court case of significance and impact". The opinion had 9.7% "the" rate and about 0.96% "which" rate. The first one on par with contemporary SOTU addresses and the second is a bit lower.

    Another point of comparison is CJ Roberts' opinion in NFIB v. Sebelius aka the Obamacare case. The rate of "the" is 7.3% and the rate of "which" is 0.26%. Not quite as large downswing as for SOTU, but appreciable one.

    Average sentence length dropped from 23 to 12 and, of course, the correlation between sentence length and presence of "which" is the first thing to check…

  11. richardelguru said,

    January 27, 2014 @ 11:14 am

    Do you think you should get ahead of the curve and analyze the use of FPSPs?

    [(myl) The pattern of SOTU pronoun use over time is actually somewhat interesting… More on this later.]

  12. JW Mason said,

    January 27, 2014 @ 12:36 pm

    Can't wait to see someone make an argument for national decline based on "the" frequencies.

    "In the early days of the Republic, there was an admirable commitment to definite claims about hard facts, marked by the definite article. But politicians today, raised in an atmosphere of cultural relativism, have forgotten how to make, strong definite claims. instead they resort to statements about vague abstractions, marked by indefinite articles or no article at all."

    Maybe I should pitch something to the WSJ…

  13. D.O. said,

    January 27, 2014 @ 1:53 pm

    @JW Mason. But you have worked in only 3 "the" in your 58 word proposal for the rate of a bit more than 5%, which actually is lower than the contemporary average. If you now add a paragraph written more in the spirit of the "admirable commitment" you can show the culturally relativistic public how superior the old ways are.

  14. Jonathon Owen said,

    January 27, 2014 @ 2:52 pm

    Mara K.: It's hard to search for syntactic position, because most corpora are not parsed, but you can come up with some frames that narrow things down a bit. I wrote a post on which hunting before, based on some research from one of my classmates. And in the research for my thesis, I found that changing which to that is one of the most frequent corrections that copyeditors make, which is presumably one of the main driving factors behind the decline of which.

  15. Yuval said,

    January 28, 2014 @ 12:37 pm

    Is there a discernible change in the backgrounds of the presidents throughout the years? In level of education, or origin State, or something else which may indicate a dialectal back-wind supporting a nation-wide trend?

  16. David Morris said,

    January 28, 2014 @ 4:52 pm

    The most common context for 'the' to be used is 'the + NOUN'. Other words which can be used in the same pattern are 'this, that, my, your, his, her, its, our, their and N's'. Has there been a corresponding increase in the use of those words?
    Also, 'the' can be omitted before plural countable nouns and uncountable nouns. Is there a long-term trend between 'the' and (null) in those patterns?

  17. J. W. Brewer said,

    January 29, 2014 @ 1:42 pm

    Re Yuval's question, I don't think there's been a huge overall trend, but there certainly can be considerable incumbent-to-incumbent variation. Two pairs that might be worth studying would be FDR/Truman and Kennedy/Johnson, where in each case they are temporally adjacent and presumably dealing with relatively similar issues from a relatively similar political perspective, but where there is a stark difference in personal background on dimensions relevant to language variety, with the first in each pair having a very "elite" background in terms of family wealth, social class, regional origin, and formal education, and the second differing from the first in all four respects. That said, I would not anticipate a whole lot of difference in a formal sort of text like a SOTU address, where you would expect the "outsider" (who in the case of both Truman and Johnson had already spent a considerable amount of time in Congress and thus had opportunity to observe and assimilate to the local linguistic folkways) to adopt the register deemed appropriately solemn to be suitable for the occasion.

RSS feed for comments on this post