Some clarifications about my Wall Street Journal article, which seems to have led to some misunderstandings among Language Log’s readers (as well as over at Languagehat). Since the readers here are the most well-informed audience that piece will ever reach outside of professional linguists, I thought it’d be useful to clarify what I based the observations in that piece on.
Archive for Linguistic history
Howard Oakley ("Birth of a new English phrase", 1/23/2015) was struck by the phrase "all proper and shit", in the context of a tweet by Christopher Phin noting that "[choice of printing mode] makes my writing seem all proper and shit". So Howard investigated the history of that four-word sequence by means of various web search tools.
I strongly support the combination of linguistic curiosity and empirical methods, but in this case, I'm puzzled by the fact that Howard saw the phrase as novel. As far as I can see, "all proper and shit" is a syntactically, semantically, and pragmatically compositional combination of two constructions that have existed in English for hundreds of years.
Ten days ago, I documented a striking 20th-century decrease in the frequency of the definite article the ("Decreasing definiteness", 1/8/2015) — from about 6.6% to about 5.4% in the Corpus of Historical American English; from about 6.4% to 5.2% in the Google Books ngram indices; and from about 9.3% to about 4.7% in U.S. presidents' State of the Union messages.
In two follow-up posts, I offered some additional ideas about this change:
In "Why definiteness is decreasing, part 1", I suggested that it might be connected to an overall decrease in the formality of published English, starting with the observation that in contemporary English, the frequency of the varies by a large factor between very formal material (6.42% in the "Academic" genre of the Corpus of American English) and conversational speech (2.47% in the Fisher corpus).
In "Why definiteness is decreasing, part 2", I noted that both in a collection of Facebook posts and in Fisher conversational speech transcripts, older people use the more often than younger people, and men use the more often than women; and I wondered whether this is a stable life-cycle and gender-identity difference, or the result of a change in progress. (Or both…)
Today, I want to discuss a third idea about the decreasing frequency of the, suggested to me by Jamie Pennebaker.
In an earlier post on this topic ("Why definiteness is decreasing, part 1"), I suggested that the decrease in definite-article frequency in published English text, over the course of the past century, might be connected with a decrease in formality. Roughly, this means that writing has been becoming more like speech (though speech has also been changing, and writing and speech remain very different).
In this post, I want to discuss two other socio-stylistic dimensions — age and sex. If the language is changing, then we expect to see "age grading", where younger people tend to exhibit the innovative pattern, while older people's usage is more old-fashioned. And because women are generally the leaders in language change, we expect to see women at every age being more linguistically innovative and men being more conservative. In other words, "young men talk like old women". And as the plot on the right illustrates, differences by age and sex in the frequency of the seem to confirm this hypothesis. (Click on the graph for a larger version.)
I ended yesterday's post ("Decreasing Definiteness") with a promise to say more about why the frequency of the has decreased so much over the past century or so, and this morning's post will start to redeem that promise.
As several commenters observed, there are probably several different things going on here. But I think that one relevant factor is decreasing formality of style.
I'll leave for another day the question of what formality really is, and why a decrease in formality correlates with a decrease in the frequency of the. In this post, I'll try to establish two simpler points:
- In English text that's more formal, in common-sense terms, the is more common;
- The formality of (various genres of) English writing has been decreasing over the past century or so.
During the course of the 20th century, the frequency of the English definite article the decreased gradually and radically. I first noticed this effect about a year ago, in a post about the history of State of the Union addresses ("SOTU evolution", 1/26/2014), where I observed, in reference to the graph on the right, that
The average frequency of the in the most recent 10 SOTU addresses (2004-2013) was 47,458 per million words; in the first 10 addresses (1790-1799, all delivered as speeches to Congress) it was 93,201 per million words, almost double the frequency. And the decline during the 20th-century era of oral addresses seems to have been a gradual one.
I speculated that
Maybe the style of speeches has been getting gradually less formal, and therefore gradually less like written style. Or maybe even formal styles have been changing.
And I noted that a corresponding effect can be seen in two other sources, the BYU Corpus of Historical American English (COHA) and the Google Books N-Gram viewer (GNG), though it is considerably smaller in magnitude:
COHA and the Google Books data pretty much agree, which is reassuring; and they both suggest a slight decline in the frequency of the; but the change that they show is very modest compared to the change in SOTU frequencies. So I feel that the explanation for the SOTU change remains to be found.
At that point, I turned my attention to other aspects of SOTU evolution. But a student paper recently reminded me of this issue.
The following is a guest post by Ammon Shea, a researcher for the Oxford English Dictionary's Reading Program and formerly a consulting editor for American Dictionaries for Oxford University Press.
Hendrik Hertzberg has made a series of claims recently on the New Yorker web site ("Nobody Said That Then!") about the ostensible inaccuracy of the language used in the television show Masters of Sex. His main contention is that many of the characters' utterances are improbable, asserting that certain words and phrases were not in use at the time that the show takes place (the mid-1950s). One of the problems with making bold and declarative statements about the origins of specific words is that these words have a nasty habit of first appearing much earlier or later than memory or intuition would attest. Read the rest of this entry »
Read the rest of this entry »
A few years ago, I wrote about a presentation by Bridget Jankowski on the trend towards increasing use of 's as opposed to of, in phrases like "the government's responsibility" vs. "the responsibility of the government". My post was "The genitive of lifeless things", 10/11/2009, and the slides from her talk are here.
I was reminded of this recently, while looking at usage changes in State of the Union messages over the centuries. Apostrophe-s has seen a recent radical increase in SOTU frequency, reflecting in amplified form a more gradual increase in the English language as a whole. Such gradual, long-term trends are a puzzle: why and how do linguistic changes keep going for several centuries in the same direction, as they often do? You could ask the same question about other cultural changes, I guess, but for linguistic features that are preserved in the written form of a language with a textual history, like English, we have quantitative evidence over hundreds of years.
The American Dialect Society chose because as its Word Of The Year, and thereby provoked an argument, here and elsewhere, about parts of speech. Most dictionaries and grammars see words like for, in, since, etc. as variously prepositions, adverbs, conjunctions, or particles, depending on how they're used. Geoff Pullum argues that they're all always prepositions, just used in different ways. (See "Because syntax", 1/5/2014, and "The promiscuity of prepositions", 1/8/2014, for some of Geoff's reasons.)
It's worth pointing out that the complex patterning of these words in contemporary English is the outcome of an even more complex historical process.
This is a guest post by Dick Hudson, who has promised a later submission about his experience helping to organize the re-introduction of grammatical analysis in the British school curriculum. This post gives some of his reflections on the pre-history of the grammarless state that he played a role in changing. Read the rest of this entry »
Read the rest of this entry »
Several times over the past few years, I've speculated that American "uptalk", stereotypically associated with Californian "Valley Girls" in the 1980s, might in fact have originated with the characteristically rising intonational patterns of northern England, Scotland, and Ireland, by way of the Scots-Irish immigrants who migrated to California in the 1930s Dust Bowl exodus. For example,
It seems plausible to me that "uptalk" in the U.S., Canada, New Zealand, and Australia represents the spread (or in some cases just the observation) of a pattern that's been normal in some regional varieties of English for a thousand years or more, originally representing the results of contact with Celtic and/or Scandinavian languages. In the U.S., the history might involve the people of Scots-Irish background who migrated to California during the Dust Bowl era in the 1930s, who formed a substantial part of the ethnic background of the "valley girl" stereotype.
There's a fair amount of evidence out there about how the "Okies" talked — so for this morning's Breakfast Experiment™ I thought I'd take a first look, starting with Alan Lomax's 1940 interview with Woody Guthrie, in which Woody reminisces about his boyhood in Okemah, Oklahoma.
Hezy Laing, "Examining Edenics, the Theory That English (and Every Other Language) Came From Hebrew", The Tablet 10/31/2013:
What if one day, instead of speaking hundreds of different languages, all of humanity suddenly began speaking the exact same language? More incredibly—what if we already do? A new movement called “Edenics” makes the claim that modern day English is simply a derivative of biblical Hebrew. In fact, the proponents of this theory say that all human languages are simply offshoots of Hebrew and claim to have thousands of examples to back them up.
From Reuben Fischer-Baum, "Six Decades of the Most Popular Names for Girls, State-by-State", Jezebel 10/19/2013: