The shape of a LibriVox phrase
Here's what you get if you align 11 million words of English-language audiobooks with the associated texts, divide it all into phrases by breaking at silent pauses greater than 150 milliseconds, and average the word durations by position in phrases of lengths from one word to fifteen words:
The audiobook sample in this case comes from LibriSpeech (see Vassil Panayotov et al., "Librispeech: An ASR corpus based on public domain audio books", IEEE ICASSP 2015). Neville Ryant and I have been collecting and analyzing a variety of large-scale speech datasets (see e.g. "Large-scale analysis of Spanish /s/-lenition using audiobooks", ICA 2016; "Automatic Analysis of Phonetic Speech Style Dimensions", Interspeech 2016), and as part of that process, we've refactored and realigned the LibriSpeech sample, resulting in 5,832 English-language audiobook chapters from 2,484 readers, comprising 11,152,378 words of text and about 1,571 hours of audio. (This is a small percentage of the English-language data available from LibriVox, which is somewhere north of 50,000 hours of English audiobook at present.)
Read the rest of this entry »