Syllable-scale wheelbarrow spectrogram
« previous post | next post »
Following up on Saturday's post "Towards automated babble metrics", I thought I'd try the same technique on some adult speech, specifically William Carlos Williams reading his poem "The Red Wheelbarrow".
Why might some approach like this be useful? It's a way of visualizing syllable-scale frequency patterns (roughly 1 to 8 Hz or so) without having to do any phonetic segmentation or classification. And for early infant vocalizations, where speech-like sounds gradually mix in with coos and laughs and grunts and growls and fussing, it might be the basis for some summary statistics that would be useful in tracing a child's developmental trajectory.
Is it actually good for anything? I don't know . The basic idea was presented in a 1947 book as a way to visualize the performance of metered verse. Those experiments didn't really work, and the idea seems to have been abandoned afterwards — though the authors' premise was that verse "beats" should be exactly periodic in time, which was (and is) false. In contrast, my idea is that the method might let us characterize variously-inexact periodicities.
There are lots of old-fashioned processing parameters to vary, and many more newer methods to try. This is just the result of a few minutes playing around with a different sort of audio, in the hopes that something interesting might eventually turn up.
Here's WCW reading at the Library of Congress in May of 1945:
Here are the tracks of RMS amplitude, smoothed amplitude, and smoothed amplitude derivative:
Here's a low-frequency spectrogram based on the RMS derivative:
Here's the first phrase, along with a conventional waveform/spectrogram/f0 track:
And the second phrase:
Verdict? Dunno. Maybe more later…