Earlier this year, I observed that there seem to be some interesting differences among individuals and styles of speech in the distribution of speech segment and silence segment durations — see e.g. "Sound and silence" (2/12/2013), "Political sound and silence" (2/8/2016) and "Poetic sound and silence" (2/12/2016).
So Neville Ryant and I decided to try to look at the question in a more systematic way. In particular, we took the opportunity to compare the many individuals in the LibriSpeech dataset, which consists of 5,832 English-language audiobook chapters read by 2,484 speakers, with a total audio duration of nearly 1,600 hours. This dataset was selected by some researchers at JHU from the larger LibriVox audiobook collection, which as a whole now comprises more than 50,000 hours of read English-language text. Material from the nearly 2,500 LibriSpeech readers gives us a background distribution against which to compare other examples of both read and spontaneous speech, yielding plots like the one below: