Archive for Linguistic history

Style or artefact or both?

In "Correlated lexicometrical decay", I commented on some unexpectedly strong correlations over time of the ratios of word and phrase frequencies in the Google Books English 1gram dataset:

I'm sure that these patterns mean something. But it seems a little weird that OF as a proportion of all prepositions should correlate r=0.953 with the proportion of instances of OF immediately followed by THE, and  it seems weirder that OF as a proportion of all prepositions should correlate r=0.913 with the proportion of adjective-noun sequences immediately preceded by THE.

So let's hope that what these patterns mean is that the secular decay of THE has somehow seeped into some but not all of the other counts, or that some other hidden cause is governing all of the correlated decays. The alternative hypothesis is that there's a problem with the way the underlying data was collected and processed, which would be annoying.

And in a comment on a comment, I noted that the corresponding data from the Corpus of Historical American English, which is a balanced corpus collected from sources largely or entirely distinct from the Google Books dataset, shows similar unexpected correlations.

So today I'd like to point out that much simpler data — frequencies of  a few of the commonest words — shows some equally strong correlations over time in these same datasets.

Read the rest of this entry »

Comments (9)

The determiner of the turtle is heard in our land

One useful way to look at the "The case of the disappearing determiners" is to compare bible translations, because this controls to some extent for variation in the underlying message. So as a first tentative step on that path, I compared the  Song of Solomon in the King James Version, first published in 1611, with the Song of Solomon in the Message Bible, published between 1993 and 2002.

The overall statistics for the Song of Solomon in the two sources show a fall of about 38% relative:

Version # words # the % the
kjv 2663  175  6.57%
msg 2737  111  4.06%

And here are a couple of specific verses to compare:

kjv 2:12: The flowers appear on the earth; the time of the singing of birds is come, and the voice of the turtle is heard in our land;
msg 2:12: Spring flowers are in blossom all over. The whole world's a choir – and singing! Spring warblers are filling the forest with sweet arpeggios.

kjv 2:17: Until the day break , and the shadows flee away, turn, my beloved, and be thou like a roe or a young hart upon the mountains of Bether.
msg 2:17: Until dawn breathes its light and night slips away. Turn to me, dear lover. Come like a gazelle. Leap like a wild stag on delectable mountains!

Read the rest of this entry »

Comments (35)

Dutch DE

Following up on yesterday's post "The case of the disappearing determiners", Gosse Bouma sent me some data from the CGN ("Corpus Gesproken Nederlands"), about determiner use in spoken Dutch by people born between 1914 and 1987. According to the CGN website,

The Spoken Dutch Corpus project was aimed at the construction of a database of contemporary standard Dutch as spoken by adults in The Netherlands and Flanders. […] In version 1.0, the results are presented that have emerged from the project. The total number of words available here is nearly 9 million (800 hours of speech). Some 3.3 million words were collected in Flanders, well over 5.6 million in The Netherlands.

It's not clear to me exactly when the recordings were made, but the project ran from 1998 to 2004.

Gosse sent data focused on the word de, which is the definite article for masculine and feminine ("common") nouns in Dutch, cognate with English the.  (The definite article for neuter nouns, het, is less frequent and also can be used as a pronoun.)

The results are similar to those that I reported earlier for English: Older people use the definite article more frequently than younger people (at least for people born in the 1950s onwards), and at every age, men use the definite article more than women.

Read the rest of this entry »

Comments (5)

The case of the disappearing determiners

For the past century or so, the commonest word in English has gradually been getting less common. Depending on data source and counting method, the frequency of the definite article THE has fallen substantially — in some cases at a rate as high as 50% per 100 years.

At every stage, writing that's less formal has fewer THEs, and speech generally has fewer still, so to some extent the decline of THE is part of a more general long-term trend towards greater informality. But THE is apparently getting rarer even in speech, so the change is more than just the (normal) shift of writing style towards the norms of speech.

There appear to be weaker trends in the same direction, at overall lower rates, in German, Italian, Spanish, and French.

I'll lay out some of the evidence for this phenomenon, mostly collected from earlier LLOG posts. And then I'll ask a few questions about what's really going on, and why and how it's happening. [Warning: long and rather wonky.]

Read the rest of this entry »

Comments (54)

Irish DNA and Indo-European origins

"Scientists sequence first ancient Irish human genomes", Press Release from Trinity College Dublin:

A team of geneticists from Trinity College Dublin and archaeologists from Queen's University Belfast has sequenced the first genomes from ancient Irish humans, and the information buried within is already answering pivotal questions about the origins of Ireland's people and their culture.  

The team sequenced the genome of an early farmer woman, who lived near Belfast some 5,200 years ago, and those of three men from a later period, around 4,000 years ago in the Bronze Age, after the introduction of metalworking. […]

These ancient Irish genomes each show unequivocal evidence for massive migration. The early farmer has a majority ancestry originating ultimately in the Middle East, where agriculture was invented. The Bronze Age genomes are different again with about a third of their ancestry coming from ancient sources in the Pontic Steppe.

"There was a great wave of genome change that swept into Europe from above the Black Sea into Bronze Age Europe and we now know it washed all the way to the shores of its most westerly island," said Professor of Population Genetics in Trinity College Dublin, Dan Bradley, who led the study, "and this degree of genetic change invites the possibility of other associated changes, perhaps even the introduction of language ancestral to western Celtic tongues."

Read the rest of this entry »

Comments (14)

Which-hunting — and relative decline?

In "A quantitative history of which-hunting", I reproduced a plot due to (an anonymous colleague of) Jonathan Owen, showing that texts from the last half of the 20th century saw a decrease in the relative frequency of NOUN which VERB, and an increase in the relative frequency of NOUN that VERB. Jonathan took this to indicate the success of (usage guides like) Strunk & White's The Elements of Style in persuading writers and copy-editors to avoid which in "restrictive" (AKA "defining" or "integrated") relative clauses

Here are some plots showing the effect, for data (without smoothing) from the Google Books ngram corpus. The "British English" dataset shows about the same increase in NOUN that as the "American English" collection does, but somewhat less decrease in NOUN which:

American English British English

Read the rest of this entry »

Comments (12)

"… to do is (to) VERB …"

Dyami Hayes writes to point out that there has been a change over the past century in the relative popularity (at least in printed text) of constructions like these:

What this book sets out to do is to provide some tools, ideas and suggestions for tackling non-verbal reasoning questions.

What it attempts to do is provide a framework for understanding how local governments are organized.

The Google Books ngram plots for provide, look, tell, and say show similar patterns — or summed for those four verbs (with the to do is VERB version in red and the to do is to VERB version in blue):

Read the rest of this entry »

Comments (12)

New discovery in English historical lexicography

A retired lecturer in medieval history, Dr Paul Booth, has discovered a reference in a 1310 court record to a man named Roger Fuckebythenavele, and he believes it really does mean that the man was known as Roger Fuck-By-The-Navel, the surname (possibly a nickname given by enemies) actually meaning "fuck via the belly button", so this may be the earliest known use of the verb fuck in its sexual sense.

Read the rest of this entry »

Comments off

A decision entirely

Urgent bipartite action alert for The Economist: First, note that my copy of the July 18 issue did not arrive on my doormat as it should have done on Saturday morning, so I did not have my favorite magazine to read over the weekend; please investigate. And second, the guerilla actions of the person on your staff who enforces the no-split-infinitives rule (you know perfectly well who it is) have gone too far and are making you a laughing stock. Look at this sentence, from an article about Iran (page 21; thanks to Robert Ayers for pointing it out; the underlining is mine):

Nor do such hardliners believe compliance will offer much of a safeguard: Muammar Qaddafi's decision entirely to dismantle Libya's nuclear programme did not stop Western countries from helping his foes to overthrow and kill him.

Read the rest of this entry »

Comments off


One of the small streets near where I'm staying for a couple of months is the Rue Lhomond, which the street signs tell me is named for a grammarian, Charles François Lhomond (1727-1794). Since I pass the intersection every day on my way to the LPP, I've been curious about what this grammarian's grammar was like. And Gallica offers his Élémens de la Grammaire Françoise (1780), which begins like this:

La Grammaire est l'art de parler & d'écrire correctement. Pour parler & pour écrire on emploie des mots : les mots sont composés des lettres.

Il y a deux sortes de lettres, les voyelles et les consonnes.

Les voyelles sont a , e , i , o , u , & y. On les appelle voyelles, parce que, seules, elles forment une voix, un son.

Il y a trois sortes d'e ; e muet, e fermé, e ouvert.

Grammar is the art of speaking and writing correctly. To speak and to write one uses words : words are made up of letters.

There are two kinds of letters, vowels and consonants.

The vowels are a , e , i , o , u , & y. We call them vowels, because, alone, they form a voice, a sound.

There are three kinds of e ; mute e, closed e, open e.

Read the rest of this entry »

Comments (44)

Dilige et quod vis fac

A few weeks ago, Eric Baković organized a "Short 'schrift" in honor of Alan Prince's forthcoming retirement, asking for

– a paean
– a poem
– a story
– a greeting
– an expression of gratitude
– a work of art (whatever that may mean to you)
– a 'classic-style' squib (à la 1970s-era LI)
– a brief analytical argument
– a simple formal proof
– a spoof of any of the above

I contributed a story, "Dilige, et quod vis fac". The result has now been revealed —  squibs,  greetings and thanks, stories, music, images, poetry & prose,  from the archives, family & friends — so I'm reprinting my contribution below.

Read the rest of this entry »

Comments (8)

Solving the mystery of "off the cuff"

Peter Jensen Brown, "Paper Linen and Crib Notes – A Well-Planned History of 'Off the Cuff'", Early Sports and Pop Culture History Blog, 2/20/2015, following up on "The 'off the cuff' mystery", 8/16/2012:

The idiom, “off the cuff,” meaning “without preparation . . . as if from impromptu notes made on one’s shirt cuffs,” dates to the 1930s.  Mark Liberman, the Christopher H. Browne Distinguished Professor of Linguistics at the University of Pennsylvania, pushed the earliest known use of “off the cuff” back from 1938 to 1936; but wondered how or why the expression came into being decades after detachable paper cuffs had long fallen out of fashion, and with no apparent immediate impetus.  Charlie Chaplin’s film, Modern Times, released in February 1936 (which features a scene in which Chaplin’s Tramp writes notes on his cuffs), notwithstanding; he could not find a satisfactory reason for the decades-long gap between paper-cuff fashion and the “off the cuff” expression; none of the seemingly plausible explanation made sense.  “So what happened?”

For the answer, see the rest of Peter's post.

[h/t Peter Reitan]



Comments (15)

John McWhorter responds

Some clarifications about my Wall Street Journal article, which seems to have led to some misunderstandings among Language Log’s readers (as well as over at Languagehat). Since the readers here are the most well-informed audience that piece will ever reach outside of professional linguists, I thought it’d be useful to clarify what I based the observations in that piece on.

Read the rest of this entry »

Comments (21)