Pre-filled-pause lengthening

« previous post | next post »

It's well known that syllables and words are longer before silent pauses, other things equal.  It makes sense that syllables and words would also be longer before filled pauses (UH and UM), but I haven't seen this explicitly noted or quantified. For a course assignment, I recently prepared an R-accessible version of  Joe Picone's manually-corrected word alignments for the Switchboard corpus (done when he was at the Institute for Signal and Information Processing at Mississippi State) — and so for this morning's Breakfast Experiment™, I thought I'd take a quick look at pre-filled-pause lengthening.

For a quick sketch of what "pre-pausal lengthening" means, take a look at this plot of average word duration by phrase position for 8-word-long phrases (the modal phrase length) in the Switchboard corpus:

(For this purpose, "phrase" is defined simply as a sequence of words between silent pauses — see "The shape of a spoken phrase", 4/12/2006, for additional details.)

How does pre-pausal duration compare to pre-filled-pause duration? Well, in the case of the specific word and, for example, we have a mean duration of 180 milliseconds when neither a silence nor a filled pause follows, compared to a mean of 305 milliseconds before silent pauses and a mean of 400 milliseconds before UM/UH:

Before [silence] Before UM/UH Neither
Mean 305 msec. 400 msec. 180 msec.
Std. Err. 1.2 msec. 1.3 msec. 0.4 msec.
N 14,837 9,717 84,092

(Since the counts are quite large in all cases, the standard errors and corresponding confidence intervals are rather small, guaranteeing massive statistical significance for the differences. More important, the differences are large enough to be of practical and communicative significance.)

If we look in a similar way at mean durations by position for the 20 commonest words in this corpus (which collectively constitute 35% of all lexical tokens), we can see that pre-filled-pause tokens are (like pre-silent-pause tokens) reliably longer, on average, than tokens that are neither pre-silence nor pre-filled-pause:

And in most cases, the pre-filled-pause tokens of each of these 20 words are even longer, on average, than the pre-silent-pause tokens:

If we look at the mean duration of all words, we see strong pre-filled-pause lengthening, but tokens in pre-filled-pause position are not longer on average than tokens in pre-silent-pause position:

Before [silence] Before UM/UH Neither
Mean 388 msec. 374 msec. 224 msec.
N 496,886 55,743 2,519,684

Obviously the mix of words in each category is quite different, so this last set of numbers needs to be taken with an appropriately-sized grain of salt. Still, it's clear that pre-filled-pause lengthening is a fact, just as pre-silent-pause lengthening is.

Update 11/11/2014 — In tune with Herb Clark and Jean Fox Tree's guest post ("On thee-yuh fillers uh and um",11/11/2014), I should break out pre-UH and pre-UM lengthening separately. For all words:

Before [silence] Before UH Before UM Neither
Mean 388 msec. 369 msec. 400 msec. 224 msec.
N 496,886 46,235 9,508 2,519,684

And for the 20 commonest words taken individually:



  1. D.O. said,

    November 10, 2014 @ 4:57 pm

    We can probably go a step deeper and distinguish situations [word]/[word], [word]/[pure silence], [word]/[um/uh], and [word]/[filled silence], where [filled silence] is silence interrupted by UM or UH.
    The tails are extremely long, it makes sense to go by quartiles
    Here are the results for [word] = and
    type                             N      25%       50%      75%
    and/[word]:              84092 110ms   146ms   210ms
    and/[pure silence]:    14078 200ms   265ms   372ms
    and/[um/uh]:              9717 317ms   393ms  473ms
    and/[filled silence]:       759 226ms   300ms   441ms
    So it seems that pauses that start with silence, but then are filled with UM or UH make preceding and slightly longer then pure pauses. Its distribution also has much larger skeweness, which maybe means that a speaker sort of hesitates between and/[um/uh] and and/[pure silence] when producing and/[filled silence] with shorter ands build on [pure silence] model and longer ones build on [um/uh] model.

    If I have more time today (not likely), I will try some other words.

  2. Lawrence Clayton said,

    November 15, 2014 @ 12:36 pm

    I'm reminded of Biblical Hebrew and Classical Arabic reading norms. Also of the announcements I hear on NPR (but I think that's a robot).

RSS feed for comments on this post