Disfluency stylings: On beyond hesitation

Some things that "everybody knows" are refuted repeatedly by the experience of everyday life. A notably example is the function of "filled pauses", whose American English versions are conventionally written "um" and "uh".

Dictionaries all say that these are are expressions of hesitation, doubt, uncertainty; ways to fill time or hold the floor. The OED glosses uh as "Expressing hesitation", and um as "Used to indicate hesitation or doubt in replying to another". Wiktionary glosses uh as "Expression of thought, confusion, or uncertainty", or "Space filler or pause during conversation", and um as "Expression of hesitation, uncertainty or space filler in conversation". Merriam-Webster glosses uh as "used to express hesitation", and um as "used to indicate hesitation". Collins glosses uh as "used when hesitating in speaking, as while searching for a word or collecting one's thoughts", and um as "used in writing to represent a sound that people make when they are hesitating, usually while deciding what they want to say next".

So what are we to make of this, the opening phrase of an hour-long video interview?

uh thanks for tuning in today

It's not credible that the interviewer is really hesitant or uncertain about his opening phrase. And in any case, that phrase-initial uh is all of 125 milliseconds long (1/8 of a second), which is not a lot of thought-collection time.

The same puzzle applies to the leading edge of four of the next seven pause groups (stretches of speech separated by silent pauses) in the interviewer's first turn:

um here today with Dr. Anthony Fauci
uh to discuss the recent surge in covid
uh we can all stay safe
uh the country's trajectory going forward

Here's the 20-second opening passage that these examples was taken from, which is the start of Mark Zuckerberg's 7/16/2020 interview with Anthony Fauci:

If you listen to more of the interview, you'll find that starting speech segments with um or uh is a consistent characteristic of Mr. Zuckerberg's speaking style, at least in this interaction. I transcribed the first two turns  for both speakers — 730 words in 218 seconds overall for Zuckerberg  — and found 28 instances of uh and 24 instances of um, for a total um/uh rate of 52 out of 730, or 7.1%.

This is a relatively high rate, but not unusually so. What's more unusual is that 45/52 (87%) of these ums and uhs were speech-segment initial, i.e. preceded by a silent pause and followed immediately by speech. (I also observe that the choice of uh vs. um seems to be influenced by the start of the following word, with uh generally preceding consonants and um preceding vowels or /h/ — though this needs to be checked more carefully.)

In an earlier post on the "meaning" of filled pauses I reported some rather different proportions from the Switchboard corpus of conversational telephone speech:

all UM 21076
all UH 68991

The same arrangement for (this admittedly small sample from) Mark Zuckerberg:

all UM 24
all UH 28

As a point of comparison, Anthony Fauci's first two turns have five uhs and no ums in 1058 words, for an overall rate of 0.5%. This underlines the fact that there's a lot of individual (and contextual) variation in um/uh rates.

But more interestingly,  Mark Zuckerberg's um/uh pattern reinforces the idea that um and uh have a variety of meanings (functions? sources? interpretations? etiologies?), and that there's also a lot of individual (and contextual) variation in this dimension as well. The linguistic and psycholinguistic literature on this is better than the dictionaries, but still quite incomplete.




  1. Philip Taylor said,

    July 20, 2020 @ 6:22 am

    I have not passed the first audio clip through PRAAT (tho' I clearly should), but it sounds to me as if the introductory "er…" is pitched somewhat lower than the "thanks …", and I wonder whether that partially explains its presence. If the speaker (with whom I have no prior familiarity) normally speaks at the pitch of his "er…", but unconsciously or intentionally raises his pitch when broadcasting, might the "er…" allow him to "get started", so to speak, before switching to his higher broadcasting register ?

    [(myl) It's a good point that Zuckerberg's segment-initial ums and uhs are generally produced at about 50-60 Hz, which is an octave below the range of the rest of his productions. But I would be very surprised if his normal speaking range was anything like that low — he's not starting in a "normal" voice and then transitioning quickly into "broadcasting" voice, he's just producing very-low-pitched segment-initial fillers.]

  2. Bloix said,

    July 20, 2020 @ 7:54 am

    "It's not credible that the interviewer is really hesitant or uncertain about his opening phrase."
    It's credible to me. I've got years of experience speaking to judges and juries and in meetings, and in questioning witnesses in deposition and at trial, and I find that if I don't write out the first words I'm going to say, I almost always have a brief moment of uncertainty before I can start talking.

    Except in set-piece presentations, though ("Good morning, Your Honor. As the Court is aware, Plaintiff Jones Corp asserts three grounds for dismissal of defendant's counterclaim …"), a written opening sounds stilted, so I just wing it. I can suppress the almost involuntary "uh," but when I do, the resulting pause lasts for a full second or even two, during which I literally have to hold my breath, so in anything other than most adversarial situations I generally just say "uh" and get started.

    [(myl) I think you're right that there's a sort of "getting started uh" that (some) people find helps them produce spontaneous phrases, and you're also right that such getting-started fillers would not be produced in fluent reading. And if by "uncertainty" you mean "the state of being about to produce a spontaneous phrase", then maybe this is "a brief moment of uncertainty".

    But this is very different from the kind of uncertainty that applies when someone is trying to decide how to frame their next sentence, or what word to use in a given part of it. I don't have the data at hand, but the distribution of "uncertainty" durations is very different in the two cases. An eighth of a second just isn't that much help in putting a sentence together, and an extra eighth of a second of silence wouldn't be noticed.

    Maybe there's a special kind of "uncertainty" involving short-term memory for a recently composed phrase? Anyhow, it seems to me that there really is something to explain here. ]

  3. Bill Benzon said,

    July 20, 2020 @ 7:59 am

    Wallace Chafe discusses this sort of thing in Discourse, Consciousness, and Time (1994), though it's not obvious to me how he adds to your discussion. In any event, I no longer have the book, so I can't check. But I can quote a passage from an article I published in 2003, "Kubla Khan" and the Embodied Mind, where I discuss Chafe a bit. It sounds like you're talking about regulatory units and perhaps fragmentary units, to use his terms. Here's the passage:

    Nonetheless, the linguist Wallace Chafe has quite a bit to say about what he calls an intonation unit, and that seems germane to any consideration of the poetic line. In Discourse, Consciousness, and Time Chafe asserts that the intonation unit is “a unit of mental and linguistic processing” (Chafe 1994, pp. 55 ff. 290 ff.). He begins developing the notion by discussing breathing and speech (p. 57): “Anyone who listens objectively to speech will quickly notice that it is not produced in a continuous, uninterrupted flow but in spurts. This quality of language is, among other things, a biological necessity.” He goes on to observe that “this physiological requirement operates in happy synchrony with some basic functional segmentations of discourse,” namely “that each intonation unit verbalizes the information active in the speaker’s mind at its onset” (p. 63).

    While it is not obvious to me just what Chafe means here, I offer a crude analogy to indicate what I understand to be the case. Speaking is a bit like fishing; you toss the line in expectation of catching a fish. But you do not really know what you will hook. Sometimes you get a fish, but you may also get nothing, or an old rubber boot. In this analogy, syntax is like tossing the line while semantics is reeling in the fish, or the boot. The syntactic toss is made with respect to your current position in the discourse (i.e. the current state of the system). You are seeking a certain kind of meaning in relation to where you are now.

    Chafe identifies three different kinds of intonation units. Substantive units tend to be roughly five words long on average and, as the term suggests, present the substance of one’s thought. Regulatory units are generally a word or so long (e.g. and then, maybe, mhm, oh, and so forth), and serve to regulate the flow of ideas, rather than to present their substance. Given these durations, a single line of poetry can readily encompass a substantive unit or both a substantive and a regulatory unit.

    The third kind of unit, fragmentary, results when one of the other types is aborted in mid-execution. That is to say, one is always listening to one’s own speech and is never quite sure, at the outset of a phrase, whether or not one’s toss of the syntactic line will reel-in the right fish. If things do not go as intended, the phrase may be aborted. Fragments do not concern us, as we are dealing with a text that has been thought-out and, presumably, edited, rather than with free speech, which is what Chafe studied.

    [(myl) All of the cited examples are "substantive units", in Chafe's terminology — they just happen to start with a brief um or uh.]

  4. Abbey Road said,

    July 20, 2020 @ 9:48 am

  5. D.O. said,

    July 20, 2020 @ 10:09 am

    Maybe it's "pay attention, I am about to speak" um which got so ingrained that is produced by the speaker just before they turn whether there is any need to call attention. Sort of throat clearing.

  6. jfruh said,

    July 20, 2020 @ 10:24 am

    I actually wonder about another category of "ums" and "uhs": their use in pre-written speech meant to convey the experience of spontaneous spoken speech. Like, if I'm writing dialogue in a movie or a novel, I might pepper in "ums" and "uhs" to make it seem more "natural" — but I'd probably be more likely to use them in line with the dictionary definitions you list at the top of this post, rather than as appropriate for the actual mechanics of ordinary speech.

  7. david said,

    July 20, 2020 @ 11:10 am

    Uh . . . In elementary school (Baltimore – 1950s) I was taught and learned to suppress ums and uhs during public speaking. Also to keep my mouth shut when I wasn’t intentionally vocalizing.

    In the 90s, in internet chat rooms, I learned to insert uhs and ums to indicate I was thinking about the previous remark and was perhaps dubious or reflective. Maybe it is short for “Excuse me” or “Hello”. There were also emoticons to indicate non-verbal communication .oO(thinks about it).

    [(myl) There are some chatrooms where starting a response with "Um" is grounds for banning, based on the belief that it means something like "This is so stupidly wrong that I'm temporarily at a loss for words to explain how stupidly wrong it is…"]

  8. Michael Watts said,

    July 20, 2020 @ 12:42 pm

    Actually, since "throat clearing" was mentioned, and it was also remarked that Mark Zuckerberg's sentence-initial fillers have anomalously low pitch…

    I think when you begin talking from silence, you may do so in a way that is not actually audible. (At least, it happens to me!) Starting with an "uh" might be a way to shift that inaudibility on to a word that doesn't matter anyway, as opposed to deleting the beginning of a word that your listener would be better off hearing than not hearing.

    On this analysis, you can move on from "uh" to whatever you were going to say as soon as you can actually hear your own "uh". This would also nicely explain why you measure very short "uh"s for the radio personality — according to this theory, the "uh" has served its purpose as soon as it becomes audible.

    [(myl) That's a theory that makes sense. However, it seems to predict that the same thing should happen in reading out loud — but it doesn't.]

  9. Duncan said,

    July 20, 2020 @ 6:25 pm

    Here's a possibly better youtube link, with less of the long "waiting to start silence" intro commonly seen on live feed recordings, and starting 6:25 in to seek past all but the last couple seconds of the remainder of it, so it should start pretty quickly. The interview ends up being a bit over 53 minutes (with the 6:25 wait time that I skip the video's almost an hour).[1]


    My theory continues and was inspired by Michael Watts' post and I believe explains the reading difference, at least for this particular case.

    I think Zuckerberg started the interview with voice-activation, the first 1/8 second "uh" got initially-truncated as the voice activation kicked in (initial-truncation was my initial intuitive reaction as I played the first sample, too, before I read the article and MW's thoughts, but I hadn't formulated a why until I read MW), and during the first few seconds Zuckerberg had no way of knowing exactly when software/production switched to continuous-feed, so he continued to pre-trigger for a bit. But that initial pre-trigger did what it was supposed to do and we heard the first full phrase in its entirety.

    As the interview gets going, Fauci starts his first reply with "Well, uh…", which arguably functions as both pre-trigger and follow-on signifier, and could well become habitual for the frequently interviewed such as he. (Study question: does he exhibit a similar pattern in other interviews?) As both participants then relax into the flow, Zuckerberg's filled-pauses appear (without any statistical analysis as done in the article) to me to lengthen and perhaps become somewhat less frequent (tho still not infrequent), apparently filling the more traditional role.

    [1] There's a third version of the full interview with the wait-time prelude eliminated, but the connection quality wasn't the best and it froze too often to safely use for the purpose here. So a start-time-offset link on a somewhat wait-trimmed version seemed the best option of those available.

  10. Robert T McQuaid said,

    July 20, 2020 @ 6:30 pm

    In computer communications some protocols start with a sync pattern, necessary to allow the receiver to align to the start of eight-bit packets. Without it, the message might be misinterpreted.

    In human communication, speech that starts with um… (or hey) may assist the listener in properly syncing on the meaningful words. For example, if we don't know whether a man or woman's voice is coming, the um… tells which register to listen for.

  11. Andrew Usher said,

    July 20, 2020 @ 8:14 pm

    As you have noted this seems rather to be a quirk of Zuckerberg's (of which the origin has not been explored) than a general tendency in speech.

    I don't think many uses of 'uh'/'um' with any function are really conscious, so although it's possible some may use it as an "I'm going to start talking" signal as others use 'well', 'hey', or whatever, it seems more likely that they just sometimes happen in this context, for most of the cases you counted.

    k_over_hbarc at yahoo.com

    [(myl) As you suggest, Zuckerberg's "quirk" seems to be not that he does something nobody else does, but rather that he does it a lot.]

  12. Michael Watts said,

    July 21, 2020 @ 3:25 am

    it seems to predict that the same thing should happen in reading out loud — but it doesn't.

    It's hard to distinguish between "this phenomenon should happen in cases A and B, but something about case B suppresses it" and "this phenomenon shouldn't happen in case A or B, but something about case A provokes it".

    That said, my personal subjective experience of utterance-initial "uh" is exactly the standard view, that it fills time while I get my sentence in order. I am especially likely to produce it if asked to draw on my memory.

  13. Lane Greene said,

    July 30, 2020 @ 4:58 am

    In a weekly work meeting in which certain people are expected to state their plans for the week, I once counted, and if I remember rightly about 11 of 12 started their submission with "uh" or "um" (or, being Britain, er and erm…) They were even called on in a predictable order.

