In previous posts about filled pauses, we've seen a consistent and large sex difference: women use (what's transcribed as) "um" somewhat more than men do, and men use (what's transcribed as) "uh" a lot more than women do. This pattern has been found in two large conversational telephone speech corpora involving a mix of ages and American regions, in a collection of undergraduate speed-dating transcripts, in a collection of undergraduate "tell me about your weekend" interviews, and in a collection of several hundred sociolinguistic interviews collected over a period of four decades in Philadelphia.
There are apparently also effects of age, of region, of time period, of years of education, of Autism diagnosis, and so on. Today I'll add one more geographical data point — young adults from the Glasgow area — and one more variable — friends vs. strangers.
For this morning's Breakfast Experiment™, I've analyzed the transcripts from the HCRC Map Task Corpus. You can follow this link to read about the design of this collection — the relevant part is here:
Subjects are necessarily paired for the task, and since the pairing is under the experimenter's control we were able to vary systematically the familiarity between the participants, by asking subjects to attend with a friend. Each pair of familiar subjects was tested in coordination with another pair who were unknown to either member of the first pair. Two pairs formed a quadruple of subjects who used among them a different set of four map-pairs, with maps being assigned to pairs by Latin Square. Each subject participated in four dialogues, twice as Instruction Giver and twice as Instruction Follower, once in each case with a familiar partner, and once with an unfamiliar partner. As Instruction Giver they gave directions on the same map, but when following they used different maps each time. Half of the subjects gave instructions to a familiar partner first, the others to an unfamiliar partner first. […]
All sixty-four subjects who participated were undergraduates at the University of Glasgow. Sixty-one of the 64 subjects were Scottish, 56 of them having been born or brought-up within a thirty mile radius of Glasgow. Half the subjects were male, half were female, and their mean age was 20. […]
The experiment uses a Latin Squares design. Participants were asked to come to the experiment with someone they knew, thus forming familiar pairs.
Among other sorts of annotation, the Map Task transcripts are coded for parts of speech. Among the POS tags used is "FP", for "Filled Pause", which covers eight letter-strings:
FP FILLED PAUSE: eh, ehm, er, erm, hmm, mm, uh, um
Of these, "ehm" and "eh" are by far the most common. But for purposes of comparison with other transcription schemes that don't make so many distinctions, I've lumped the five m-final filled-pause letter strings (ehm, erm, hmm, mm, um) as UM-type, and the three vowel-final filled-pause letter strings (eh, er, uh) as UH-type.
And in addition to dividing transcript-sides by the sex of the speaker, I've also divided them according to whether the interlocutor was a friend or a stranger.
The individual and lumped counts are as follows:
Here are the same numbers scaled by overall word counts per speaker category, in frequency per 1,000 words:
- The frequency of UM is more than twice as great for female speakers compared to male speakers: 0.984% vs. 0.486%.
- The frequency of UH is nearly three times as great for male speakers compared to female speakers: 0.832% vs. 0.297%
- The proportion UM/(UM+UH) is more than twice as great for female speakers compared to male speakers: 76.8% vs. 36.8%
- The frequency of UM-words is somewhat greater between strangers than between friends: 0.88% vs. 0.58%; and the frequency of UH-words also a little greater: 0.62% vs. 0.54%. As a result, the UM/(UM+UH) proportion is nearly the same: 0.587 vs. 0.517.
I'll also note that here again, the males are on average somewhat more talkative than the females.
I remain puzzled about what is really going on here, and I continue to think that we'll need to look at the range of rather functions for specific instances of filled pauses in order to understand the nature and source of the effects.
Past LLOG posts on UM vs. UH:
"Young men talk like old women", 11/6/2005
"Fillers: Autism, gender, age", 7/30/2014
"More on UM and UH", 8/3/2014
"UM UH 3", 8/4/2014
"Male and female word usage", 8/7/2014
"UM / UH Geography", 8/13/2014
"Educational UM / UH", 8/13/2014
"UM / UH: Life-cycle effects vs. language change", 8/15/2014