Nine years ago, I stumbled on an unexpected fact about the filled pauses UM and UH ("Young men talk like old women", 11/6/2005). I found, as I expected, that older people tend to use UH more often than younger people do, and that males tend to use UH more than females. The surprising thing was that UM seemed to work in the opposite way, at least in the (large) American conversational-speech corpus that I looked at — younger people use UM more than older people, and females use UM more than males:
Last summer, some colleagues and I began a study of interviews with adolescents on the autism spectrum compared with neurotypical controls, and one of the features that we looked at was filled pause usage. We found a significant difference in UM vs. UH usage; and subsequently learned that some researchers from OGI had reported a similar finding in a poster at the 2014 International Meeting for Autism Research ("Fillers: Autism, gender, and age", 7/30/2014).
A couple of weeks later, this came up in coffee-break conversation at the Methods in Dialectology meeting in Groningen, and a few of the people sitting around the table in the break room immediately pulled out their laptops and started looking at other datasets. To our surprise, we found essentially the same pattern in the Philadelphia Neighborhood Corpus, in the (spoken part of) the British National Corpus, in the Edinburgh-Glasgow Map Task Corpus, and in collections of Dutch, German, and Norwegian conversational speech. This work has continued (for a partial progress report, see "UM / UH in Norwegian", 10/8/2014), and we hope to finish a journal paper on the topic over the holiday break. As part of the effort, I've looked a bit more closely at one of the datasets used in my 2005 post, and below I'll show you a few of the resulting pictures.
The dataset in question is the Fisher English corpus, collected in 2003, comprising 11,699 telephone conversations. I've reduced this to the 19,753 call sides attributed to native speakers of English. In this dataset, filled pauses on average make up 1.7% of all 23,149,519 million words. UM and UH are the 18th and 19th commonest words, respectively; and added together, they would be the 12th commonest word, after I, you, and, it, the, that, 's, to, know, a, and yeah.
As the histogram on the right suggests, individual speakers show quite a wide of filled-pause rates, from 0% to more than 14%. And it shouldn't be a surprise to see that filled-pause rate tends to increase with age, and that at every age, men tend to have higher rates than women:
The drop in rates for the highest age group — 75 to 85 — is statistically significant. It may be meaningful as well, perhaps because the selection process (or the demographics of surviving past 75) created a verbally-youthful sample. In any case, here are the counts behind the points in the plot, i.e. the number of speakers in each age range:
However, UM and UH behave rather differently across the age range:
And as a result, there's a gradual decrease in the proportion of filled pause that are UM:
While at every age, women use a higher proportion of UMs than men do:
And finally, if we look at rates of UM usage and UH usage separately for men and women of different ages, we see the following pattern:
So on average (at least in this dataset), men use UM and UH about equally often until about the age of 40, whereafter their rate of UH usage rises and their rate of UM usage falls.
In contrast, women use UM at a higher rate (than men) at all ages, with some indication of a drop at higher ages, while their use of UH increase steadily, from the 15-24 age bracket all the way up to 75-84, with apparent acceleration up to the 65-74 range.
There are many obvious questions. Is this a life-cycle effect, or a change in progress, or both? Is this limited to Germanic languages, or are there analogous things in other language families? Have the sex and age effects been inherited from proto-Germanic, or are they somehow ideophonically natural, or both? Given people's propensity to form social stereotypes around small or even non-existent statistical differences, and the salience of age and sex as social dimensions, why do these differences (as far as we can tell) not rise to the level of conscious awareness?
You'll find some discussion of these issues in earlier LLOG posts on this topic — see below for a chronological ist. We hope to do a better job in the joint paper in preparation.
"Young men talk like old women", 11/6/2005
"Fillers: Autism, gender, age", 7/30/2014
"More on UM and UH", 8/3/2014
"UM UH 3", 8/4/2014
"Male and female word usage", 8/7/2014
"UM / UH geography", 8/13/2014
"Educational UM / UH", 8/13/2014
"UM / UH: Lifecycle effects vs. language change", 8/15/2014
"Filled pauses in Glasgow", 8/17/2014
"ER and ERM in the spoken BNC", 8/18/2014
"Um and uh in Dutch", 9/16/2014
"UM / UH in German", 9/28/2014
"Um, there's timing information in Switchboard?", 10/5/2014
"Trending in the Media: Um, not exactly…", 10/7/2014
"UH / UM in Norwegian", 10/8/2014
"On thee-yuh fillers uh and um", 11/11/2014
"Labiality and feminity", 12/16/2014
"UM/UH accommodation", 11/24/2015