Some things that "everybody knows" are refuted repeatedly by the experience of everyday life. A notably example is the function of "filled pauses", whose American English versions are conventionally written "um" and "uh".

Dictionaries all say that these are are expressions of hesitation, doubt, uncertainty; ways to fill time or hold the floor. The OED glosses uh as "Expressing hesitation", and um as "Used to indicate hesitation or doubt in replying to another". Wiktionary glosses uh as "Expression of thought, confusion, or uncertainty", or "Space filler or pause during conversation", and um as "Expression of hesitation, uncertainty or space filler in conversation". Merriam-Webster glosses uh as "used to express hesitation", and um as "used to indicate hesitation". Collins glosses uh as "used when hesitating in speaking, as while searching for a word or collecting one's thoughts", and um as "used in writing to represent a sound that people make when they are hesitating, usually while deciding what they want to say next".

So what are we to make of this, the opening phrase of an hour-long video interview?

uh thanks for tuning in today



It's not credible that the interviewer is really hesitant or uncertain about his opening phrase. And in any case, that phrase-initial uh is all of 125 milliseconds long (1/8 of a second), which is not a lot of thought-collection time.

The same puzzle applies to the leading edge of four of the next seven pause groups (stretches of speech separated by silent pauses) in the interviewer's first turn:

um here today with Dr. Anthony Fauci
uh to discuss the recent surge in covid
uh we can all stay safe
uh the country's trajectory going forward

Here's the 20-second opening passage that these examples was taken from, which is the start of Mark Zuckerberg's 7/16/2020 interview with Anthony Fauci:

Your browser does not support the audio element.

If you listen to more of the interview, you'll find that starting speech segments with um or uh is a consistent characteristic of Mr. Zuckerberg's speaking style, at least in this interaction. I transcribed the first two turns for both speakers — 730 words in 218 seconds overall for Zuckerberg — and found 28 instances of uh and 24 instances of um, for a total um/uh rate of 7.1%.

This is a relatively high rate, but not unusually so. What's more unusual is that 45/52 (87%) of these ums and uhs were speech-segment initial, i.e. preceded by a silent pause and followed by speech. (I also observe that the choice of uh vs. um seems to be influenced by the start of the following word, with uh generally preceding consonants and um preceding vowels or /h/ — though this needs to be checked more carefully.)

In an earlier post on the "meaning" of filled pauses I reported some rather different proportions from the Switchboard corpus of conversational telephone speech:

all UM 21076 SILENCE UM SILENCE 8251 39% SPEECH UM SILENCE 7358 35% SILENCE UM SPEECH 2938 14% SPEECH UM SPEECH 2521 12% all UH 68991 SILENCE UH SILENCE 9231 13% SPEECH UH SILENCE 25150 36% SILENCE UH SPEECH 12681 18% SPEECH UH SPEECH 21696 31%

The same arrangement for (this admittedly small sample from) Mark Zuckerberg:

all UM 24 SILENCE UM SILENCE 4 17% SPEECH UM SILENCE 2 8% SILENCE UM SPEECH 18 75% SPEECH UM SPEECH 0 0% all UH 28 SILENCE UH SILENCE 0 0% SPEECH UH SILENCE 1 4% SILENCE UH SPEECH 27 96% SPEECH UH SPEECH 0 0%

As a point of comparison, Anthony Fauci's first two turns have five uhs and no ums in 1058 words, for an overall rate of 0.5%. This underlines the fact that there's a lot of individual (and contextual) variation in um/uh rates.

But more interestingly, Mark Zuckerberg's um/uh pattern reinforces the idea that um and uh have a variety of meanings (functions? sources? interpretations? etiologies?), and that there's also a lot of individual (and contextual) variation in this dimension as well. The linguistic and psycholinguistic literature on this is better than the dictionaries, but still quite incomplete.

