Language Log

UM / UH in German

September 29, 2014 @ 7:41 am · Filed by Mark Liberman under Computational linguistics, Language and gender, Sociolinguistics

« previous post | next post »

We've previously observed a surprisingly consistent pattern of age and gender effects on the relative frequency of filled pauses (or "hesitation sounds") with and without final nasals — what we usually write as "um" and "uh" in American English, or often as "er" and "erm" in British English.

Specifically, younger people use the UM form more than older people, while at any age, women use the UM form more than men do. We've seen this same pattern in various varieties of American English and in John Coleman's analysis of the spoken portion of the British National Corpus, and we found the sex effect in the HCRC Map Task Corpus, which involves task-oriented dialogues among college students from Glasgow in Scotland.

It was even more surprising that Martijn Wieling found the same pattern in a collection of Dutch conversational speech. And to make the puzzle more puzzling, Joe Fruehwald's analysis of the Philadelphia Neighborhood Corpus, which includes recordings across several decades of real time, suggests an on-going change in the direction of greater overall UM usage, as well as a life-cycle effect within each cohort of speakers. And Jack Grieve's analysis of Twitter data indicates a pattern of geographical variation within the U.S.

For additional details, see "Young men talk like old women", 11/6/2005; "Fillers: Autism, gender, age", 7/30/2014; "More on UM and UH", 8/3/2014; "UM UH 3", 8/4/2014; "Male and female word usage", 8/7/2014; "UM / UH geography", 8/13/2014; "Educational UM / UH", 8/13/2014; "UM / UH: Lifecycle effects vs. language change", 8/15/2014; "Filled pauses in Glasgow", 8/17/2014; "ER and ERM in the spoken BNC", 8/18/2014; "Um and uh in Dutch", 9/16/2014.

Now Martijn Wieling has found the same pattern in German. His guest post follows.

After conducting the analysis about the uh/um distinction and its relation to gender and age for Dutch speakers, I decided to investigate the same pattern in German. For this purpose I obtained (with help from Thomas Schmidt) frequencies of ‘uh’ and ‘um’ together with age and gender information from the Forschungs- und Lehrkorpus für gesprochenes Deutsch.

The German speakers in this dataset seem to use ‘uh’ (äh or öh) slightly more than ‘um’ (ähm or öhm), 60% versus 40%, but the imbalance is much smaller than for the Dutch data.

A logistic regression mixed-effects regression model predicting the probability of using ‘um’ (as opposed to ‘uh’) revealed that the relative frequency of ‘um’ significantly (p = 0.007) increases for women compared to men and younger as opposed to older speakers (p < 0.0001). The table and figure below illustrate this relationship by showing the relative frequency of ‘um’ in four age groups (each containing approximately 25% of the speakers). (Note that the relative frequency of ‘uh’ can be obtained by subtracting these values from 1.)

Relative frequency of ‘um’	Male	Female
Born between 1930 and 1964	0.204	0.139
Born between 1965 and 1981	0.333	0.420
Born between 1982 and 1986	0.463	0.543
Born between 1987 and 2006	0.495	0.626

While the graph suggests there to be an interaction between gender and year of birth, this interaction was not significant (p = 0.33). All results of the analysis can be viewed and replicated here: http://www.let.rug.nl/wieling/ll/analysis-German.html.

Above is a guest post by Martijn Wieling.

Several things about all this are interesting, not to say puzzling:

The pattern (greater UM usage by younger people and females) is robust across across many geographical and social varieties of English, and at least two other Germanic languages, despite what appear to be overlaid changes across time and space;
These are very large effects in the distribution of a very common feature, and yet no one seems to be consciously aware of them. I stumbled on the American English pattern in 2005 while looking for something else.

Three sorts of explanation seem to be available:

Hesitation sounds with and without final nasals have some intrinsic properties, e.g. phonetic symbolism, that differentially attract speakers of different ages and genders;
Hesitation sounds with and without final nasals have different functions, retained across Germanic languages and dialects, which are differentially useful to speakers of different ages and genders (like uncertainly about what to say vs. uncertainty about how to say it);
The age and sex associations of hesitations sounds with and without final nasals are purely conventional, like the different lateralization of male and female shirt buttons, but have somehow been retained or reinforced over thousands of years and thousands of miles.

None of these explanations seems very plausible to me — but the facts are clear.

September 29, 2014 @ 7:41 am · Filed by Mark Liberman under Computational linguistics, Language and gender, Sociolinguistics

Permalink

10 Comments

Yerushalmi said,

September 29, 2014 @ 8:01 am

Israeli Hebrew uses "eh" and "em" as filler ("eh" vs. "uh" is actually used as a shibboleth to identify American immigrants). I wonder if the pattern would appear there too, or if it is unique to the Germanic languages.
msH said,

September 29, 2014 @ 8:15 am

I think this must be a status claiming signal. Humans are very precisely, but unconsciously, aware of their own social status in any situation and they signal and claim it. Perhaps this could be tested by looking at the public speeches of female heads of state and government, and of younger women in highly responsible jobs like chief engineer, project scientist, – do they depart from the pattern or conform to it?
Coby Lubliner said,

September 29, 2014 @ 10:44 am

In Spanish (at least in Latin America), the comparable markers are eh and este, and their relative distribution was studied and reported here.
dw said,

September 29, 2014 @ 12:30 pm

Could there be a physiological explanation? Maybe old people's dentition makes the non-nasal form more likely?
Bob Ladd said,

September 30, 2014 @ 10:15 am

@dw: "Old people"?? But all the graphs we've seen in this series of posts start going up in middle age, when most of the people who are being recorded are unlikely to have had major dental problems. And how will dentition explain the sex difference?

On a separate but related point: there's a discrepancy between the table and the figure in Martijn Wieling's segment of this post: the birth years span a different range in the figure than in the table, yet they seem to be plotting the same data. Can MYL (or Martijn himself) clarify what's going on here?
Martijn Wieli,ng said,

September 30, 2014 @ 11:58 am

Thanks, Bob, for noticing that. This is a copy-paste error from my side (I copied the table from the Dutch results, filled in the values for the German data, but forgot to change the labels when sending it to Mark). The labels of the graph are correct and these are also shown correctly in the table in the results file: http://www.let.rug.nl/wieling/ll/analysis-German.html. @Mark, perhaps you can correct this?

[(myl) Fixed now.]
Bob Ladd said,

September 30, 2014 @ 3:41 pm

Here are a couple of further comments on a fascinating series of posts.
(1) I don't think any of the posts yet have mentioned a couple of papers by Herb Clark and Jean Fox Tree, which basically suggested that um is used to announce an upcoming longish delay and uh to announce an upcoming shortish delay. I think this would tie in with Mark's summary above ("uncertainty about what to say" = longer delay = um; "uncertainty about how to say it" = shorter delay = uh). These functions would also be consistent with (a) increasing uh with age (you know what you want to say, but…); (b) greater uh use by males (generally less fluent/verbal than females?); and (c ) greater um use by females and younger people (less socially secure and therefore less certain about what to say?). On point (c ), see also @msH's comment above.

(2) It's probably true that people are not consciously aware of the different functions of the two filler types, but it's noteworthy that the ordinary English phrase hemming and hawing mentions both, and that most academic papers on filled pauses in English normally give both forms as illustrations in defining what they're researching. So at some level people must be aware that the two are not equivalent.

(3) Despite the apparent implausibility of the West Germanic languages retaining this shared distinction for at least a couple of millennia, I think it's very likely that that's the correct explanation, and that this is not some manifestation of universal sound symbolism. I'm pretty sure there's no nasal-final pause filler in French or Italian (or any Romance language?), only front vowels (euh in French and eh in Italian). This ought to be a fairly straightforward empirical question, at least for any language for which there are large corpora of appropriately transcribed conversational speech.
Rubrick said,

September 30, 2014 @ 4:23 pm

@Bob Ladd: It had never ever occurred to me that "hemming and hawing" referred to those two different pauses. Thank you!
Bob Ladd said,

October 1, 2014 @ 4:30 am

@Rubrick: No thanks to me. An earlier commenter on one of these posts got there first. And I had the idea yesterday when I was looking around on the web and found a reference to hemming and hawing in this rather odd site.
KevinR said,

October 1, 2014 @ 4:16 pm

Rather than an age effect, perhaps this is a time effect. Is there anything in, say, television viewing or telephones that would lead to a preference for UM over UH?

RSS feed for comments on this post

UM / UH in German

10 Comments

Yerushalmi said,

msH said,

Coby Lubliner said,

dw said,

Bob Ladd said,

Martijn Wieli,ng said,

Bob Ladd said,

Rubrick said,

Bob Ladd said,

KevinR said,

Follow us on Twitter

Archives [+/–]

Blogroll [+/–]

Meta