Pre-filled-pause lengthening
« previous post | next post »
It's well known that syllables and words are longer before silent pauses, other things equal. It makes sense that syllables and words would also be longer before filled pauses (UH and UM), but I haven't seen this explicitly noted or quantified. For a course assignment, I recently prepared an R-accessible version of Joe Picone's manually-corrected word alignments for the Switchboard corpus (done when he was at the Institute for Signal and Information Processing at Mississippi State) — and so for this morning's Breakfast Experiment™, I thought I'd take a quick look at pre-filled-pause lengthening.
For a quick sketch of what "pre-pausal lengthening" means, take a look at this plot of average word duration by phrase position for 8-word-long phrases (the modal phrase length) in the Switchboard corpus:
(For this purpose, "phrase" is defined simply as a sequence of words between silent pauses — see "The shape of a spoken phrase", 4/12/2006, for additional details.)
How does pre-pausal duration compare to pre-filled-pause duration? Well, in the case of the specific word and, for example, we have a mean duration of 180 milliseconds when neither a silence nor a filled pause follows, compared to a mean of 305 milliseconds before silent pauses and a mean of 400 milliseconds before UM/UH:
Before [silence] | Before UM/UH | Neither | |
Mean | 305 msec. | 400 msec. | 180 msec. |
Std. Err. | 1.2 msec. | 1.3 msec. | 0.4 msec. |
N | 14,837 | 9,717 | 84,092 |
(Since the counts are quite large in all cases, the standard errors and corresponding confidence intervals are rather small, guaranteeing massive statistical significance for the differences. More important, the differences are large enough to be of practical and communicative significance.)
If we look in a similar way at mean durations by position for the 20 commonest words in this corpus (which collectively constitute 35% of all lexical tokens), we can see that pre-filled-pause tokens are (like pre-silent-pause tokens) reliably longer, on average, than tokens that are neither pre-silence nor pre-filled-pause:
And in most cases, the pre-filled-pause tokens of each of these 20 words are even longer, on average, than the pre-silent-pause tokens:
If we look at the mean duration of all words, we see strong pre-filled-pause lengthening, but tokens in pre-filled-pause position are not longer on average than tokens in pre-silent-pause position:
Before [silence] | Before UM/UH | Neither | |
Mean | 388 msec. | 374 msec. | 224 msec. |
N | 496,886 | 55,743 | 2,519,684 |
Obviously the mix of words in each category is quite different, so this last set of numbers needs to be taken with an appropriately-sized grain of salt. Still, it's clear that pre-filled-pause lengthening is a fact, just as pre-silent-pause lengthening is.
Update 11/11/2014 — In tune with Herb Clark and Jean Fox Tree's guest post ("On thee-yuh fillers uh and um",11/11/2014), I should break out pre-UH and pre-UM lengthening separately. For all words:
Before [silence] | Before UH | Before UM | Neither | |
Mean | 388 msec. | 369 msec. | 400 msec. | 224 msec. |
N | 496,886 | 46,235 | 9,508 | 2,519,684 |
And for the 20 commonest words taken individually:
D.O. said,
November 10, 2014 @ 4:57 pm
We can probably go a step deeper and distinguish situations [word]/[word], [word]/[pure silence], [word]/[um/uh], and [word]/[filled silence], where [filled silence] is silence interrupted by UM or UH.
The tails are extremely long, it makes sense to go by quartiles
Here are the results for [word] = and
type N 25% 50% 75%
and/[word]: 84092 110ms 146ms 210ms
and/[pure silence]: 14078 200ms 265ms 372ms
and/[um/uh]: 9717 317ms 393ms 473ms
and/[filled silence]: 759 226ms 300ms 441ms
So it seems that pauses that start with silence, but then are filled with UM or UH make preceding and slightly longer then pure pauses. Its distribution also has much larger skeweness, which maybe means that a speaker sort of hesitates between and/[um/uh] and and/[pure silence] when producing and/[filled silence] with shorter ands build on [pure silence] model and longer ones build on [um/uh] model.
If I have more time today (not likely), I will try some other words.
Lawrence Clayton said,
November 15, 2014 @ 12:36 pm
I'm reminded of Biblical Hebrew and Classical Arabic reading norms. Also of the announcements I hear on NPR (but I think that's a robot).