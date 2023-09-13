« previous post |

This is a simple-minded follow-up to "New models of speech timing?" (9/11/2023). Before getting into fancy stochastic-point-process models, neural or otherwise, I though I'd start with something really basic: just the distribution of inter-syllable intervals, and its relationship to overall speech-segment and silence-segment durations.

For data, I took one-minute samples from 2006 TED talks by Al Gore and Tony Robbins.

I chose those two because they're listed here as exhibiting the slowest and fastest speaking rates in their (TED talks) sample. And I limited the samples to about one minute, because I'm interested in metrics that can apply to fairly short speech recordings, of the kind that are available in clinical applications such as this one.



Listen to the samples and you'll certainly hear differences in phrasing and pausing and timing — and to a lesser extent within-phase speaking rate — which do result in a modest overall words-per-minute difference: Tony Robbins produces 254 words in 64.04 seconds, or 238 words per minute, while Al Gore produces 206 words in 64.68 seconds, or 191 wpm.

As in other examples I've analyzed in the past, the length of speech segments and silence segments in these samples is interestingly decoupled from the overall words-per-minute rates:

And as suggested in Monday's post, it's possible to interpret the speech-segment/silence-segment differences as a aspect of the syllable-level timing, as in this plot of the quantiles of time between adjacent syllables:

Or with the durations on a log scale:

It remains unclear whether (and how) such patterns can be reduced to a few diagnostically-insightful dimensions.

[Note — the "syllables" were identified by this simple automatic method…]

