Um and Uh in Dutch

« previous post | next post »

Below is a guest post by Martijn Wieling, following up on a series of LLOG postings over the years on the effects of sex, age, geography and other factors on the relative frequency of the filler words um and uh: "Young men talk like old women", 11/6/2005; "Fillers: Autism, gender, and age", 7/30/2014; "More on UM and UH", 8/3/2014; "UM UH 3", 8/4/2014; "Educational UM / UH", 8/13/2014; "UM / UH geography", 8/13/2014; "UM / UH: Life-cycle effects vs. language change", 8/15/2014; "Filled pauses in Glasgow", 8/17/2014.

I was surprised to see this effect in the first place; and more surprised to see it robustly replicated in a variety of American English datasets; and even more surprised to see the same pattern in Glasgow. The fact that the same pattern is also found in Dutch raises some interesting questions, about which more later.

After reading the various posts about the uh/um distinction and its relation to gender and age for English speakers, a colleague of the University of Groningen, Gosse Bouma, and I decided to look at this distribution in a series of spontaneous conversations extracted from a corpus of spoken Dutch (Corpus Gesproken Nederlands). While Dutch speakers also use ‘uh’ and ‘um’ as hesitation markers, they generally prefer the vocalic hesitation marker ‘uh’ over the vocalic-nasal hesitation marker ‘um’ (de Leeuw, 2007: “Hesitation markers in English, German, and Dutch”, Journal of Germanic Linguistics). No studies, however, have looked at the relationship of this distribution on the basis of gender and age.

A logistic regression model predicting the relative hesitation marker frequency of ‘um’ clearly revealed that while ‘uh’ is indeed the preferred marker, the frequency of ‘um’ significantly (p < 0.0001) increases for women compared to men and younger as opposed to older speakers. The table and figure below illustrate this relationship by showing the relative frequency of ‘um’ in four age groups (each containing approximately 25% of the speakers). (Note that the relative frequency of ‘uh’ can be obtained by subtracting these values from 1.)

 Male  Female
Born 1914-1949 0.059 0.095
Born 1950-1963 0.071 0.112
Born 1964-1975 0.098 0.162
Born 1976-1987 0.132 0.182

In graphical form:

The corpus data we use contains speakers from two countries, the Netherlands (NL) and Belgium (FL; i.e. Flanders, where Dutch is the native language). The tables and figures below show that this factor plays an important role. Speakers from Flanders show a much larger relative frequency of ‘um’ compared to the speakers from the Netherlands. In addition, the effects of both age and gender are significantly (p < 0.05) stronger for the speakers from Flanders than those from the Netherlands. In both logistic regression models, however, the effects of both age (with younger speakers showing a greater relative frequency of ‘um’) and gender (with women showing a greater relative frequency of ‘um’) are highly significant (p < 0.0001).

Relative frequency of ‘um’: NL Male Female
Born between 1914 and 1949 0.047 0.084
Born between 1950 and 1963 0.062 0.075
Born between 1964 and 1975 0.065 0.098
Born between 1976 and 1987 0.078 0.103


Relative frequency of ‘um’: FL Male Female
Born between 1914 and 1949 0.085 0.114
Born between 1950 and 1963 0.081 0.154
Born between 1964 and 1975 0.141 0.242
Born between 1976 and 1987 0.208 0.306


Details of the data and analysis code can be found here.

Above is a guest post by Martijn Wieling.


  1. Terry Collmann said,

    September 17, 2014 @ 12:37 pm

    It would be interesting to know what the figures were for the Walloon equivalents of "uh" and "um", if such fillers exist in Belgian French.

  2. P. Olango said,

    September 18, 2014 @ 11:03 am

    I am surprised to see that female use the filler words more than male in all cased.

    [(myl) If you're getting this impression from the plots in the post, you've read them wrong — what's shown is the relative frequency of UM, i.e. UM/(UM+UH).

    You can download the full data table and calculate the frequency by age and sex of UM+UH, which would be what you're asking about. I don't have time to do this at the moment, but I believe that it will show that males on average use somewhat more "filler words" that females.]

  3. vestidos de 15 said,

    September 21, 2014 @ 11:16 pm

    I agree with Terry. Would be interesting to know what the figures if such fillers exist in Belgian French. Thanks for he language log Mark

RSS feed for comments on this post