Following up on yesterday's post "The case of the disappearing determiners", Gosse Bouma sent me some data from the CGN ("Corpus Gesproken Nederlands"), about determiner use in spoken Dutch by people born between 1914 and 1987. According to the CGN website,
The Spoken Dutch Corpus project was aimed at the construction of a database of contemporary standard Dutch as spoken by adults in The Netherlands and Flanders. […] In version 1.0, the results are presented that have emerged from the project. The total number of words available here is nearly 9 million (800 hours of speech). Some 3.3 million words were collected in Flanders, well over 5.6 million in The Netherlands.
It's not clear to me exactly when the recordings were made, but the project ran from 1998 to 2004.
Gosse sent data focused on the word de, which is the definite article for masculine and feminine ("common") nouns in Dutch, cognate with English the. (The definite article for neuter nouns, het, is less frequent and also can be used as a pronoun.)
The results are similar to those that I reported earlier for English: Older people use the definite article more frequently than younger people (at least for people born in the 1950s onwards), and at every age, men use the definite article more than women.
Here are the English and Dutch results plotting in the same way. The English dataset was collected via telephone recording in 2003; the Dutch dataset was collected via face-to-face interviews between 1998 and 2004, so I've calculated ages from birth years relative to the year 2000:
|English (Results from Fisher)||Dutch (Results from CGN)|
As I observed, this pattern of age and sex effects is generally an indication of a language change in progress.
If we plot the Dutch data by decade of birth, we see a rise from birth decades 1910 to 1950, and then a steeper fall from the 1950s to the 1980s:
This is again similar to what we saw in the Google Books ngram corpus for German:
Data from the KB Historische Kranten corpus (of Dutch newspaper data) shows a generally similar pattern of rise and fall over the course of the 20th century, if we add the counts for DE, DEN, DER, DES (which were collapsed into "de" by the Spelling Reform of 1934 — apparently not really carried out in newpapers until after WWII):
(That dataset runs through 1995 — I've left out the point for the year 1995, since it seems anomalous.)
The sex effects in the CGN dataset are quite large. Males in this large collection use DE much more frequently than females do. The male/female ratio is about 1.48/1 across the whole collection (2.721% vs. 1.841%), though the ratio is smaller for younger people (1.64/1.33 = 1.23/1 for people born in the 1980s, VS. 3.07/1.81 = 1.70/1 for people born in the 1950s).
The newspaper percentages seem more female-like in the early years of the 20th century, rising to a more male-like level in the middle of the century, and then falling again. Could this be due to changes in the formality of newspaper writing in general, or the mix of sources in the KB Kranten corpus in particular?
Breaking the CGN female and male DE-usage numbers down by birth decade:
|Birth Decade||#Male||#Female||M DE%||F DE%|
And here's the same plot for HET (collapsing across the transcriptions "het" and "@t" (the reduced form), and limiting the count to those words given the pos-tag of determiner:
There are many details that don't quite match across the various languages and datasets that we've looked at. But there seems to be strong evidence for a general pattern of substantial decline in the frequency of definite articles in several European languages, at least over the last portion of the 20th century.
This raises the interesting questions of how and why. Yesterday's post (and the valuable comments on it) suggest many ideas to test — I'll look forward to seeing how it all comes out.