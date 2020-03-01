What do you hear?
Listen to this sound, and describe it in the comments below:
You can learn what the sound is, and why I care how you hear it, after the fold.
I started with 15 seconds of sound, whose fundamental frequency oscillates up and down with a period of one second, in two versions an octave apart:
|200-240 Hz
|100-120 Hz
I then divide the 15 seconds into random chunks, and choose either the higher-frequency version or the lower-frequency version for each chunk, with probability of 0.8 for the higher-frequency version. The switch between versions is done smoothly, to avoid acoustic discontinuities.
Here's what a sample looks like in the time and frequency domains — you can see the period-doubling in the waveform (the upper panel), and the corresponding frequency-halving in the spectrogram (the lower panel):
What's the point?
The phenomenon of "vocal creak" involves a similar abrupt period-doubling or halving — for fundamental reasons discussed here — which we mostly perceive as a change in voice quality rather than a factor-of-two change in pitch.
There are various reasons to want estimates of a speaker's amount (and type and location) of creak. And we also often want estimates of a speaker's "pitch range", although straightforward distribution-based measures will be strongly influenced by that speaker's amount of period-doubling and halving.
There are ideas Out There about how to calculate a "creak index", and how to estimate pitch range in a creak-resistant way — see "The great creak-off of 1969", 7/28/2015, for a simple example — but we don't really know how to validate such measures.
One approach is to start with fake data, like the signal at the top of this post, where we know exactly what the control parameters were. (And of course we could make such signals in various ways that sound more speech-like and/or more pleasant.)
If an analysis method can't recover the underlying parameters from such signals, it probably won't work on real speech. If a method works with a variety of synthetic signals, then we can be more interested in its performance on real-world data.
The particular fake data at the start of this post is not very speech-like: the overtone amplitudes are uniformly 1/F rather than modulated by vocal-tract resonances; the period-doubling happens at random times rather than preferentially in lower-pitched regions; the period-doubling transition is handled by mixing in the output domain rather than by modeling a voice-source generation process; the sinusoidal modulation of pitch creates a distribution of values that's very different from what we typically see in speech.
But to my ear, the result meets one basic condition: it doesn't sound like switching between two different sources (though that's what it is), but rather like a single source varying in timbre (or what we would call "voice quality" if it were a human voice). I'm curious to learn whether other listeners agree.
There's also an acoustic perception angle, connected to the Shepard-Risset glissando illusion — but that's a topic for another day.
Yerushalmi said,
March 1, 2020 @ 3:06 am
It put me in mind of the recorders we all got in elementary school. When you were just starting to learn it, you blew into it without thinking too much about it, and it would sound awful – but then every so often you'd feel the air "catch" in the recorder and it would sound the way it was supposed to. As you got better at it, you'd learn how to make sure the air got "caught" and stayed there…. but every so often you'd slip and get a momentary screech before you recovered yourself.
However, if I recall correctly, in the recorder the intended sound was lower-pitched than the awful screeching; here it sounds more like the intended sound is the higher-pitched one, and the lower-pitched one is the one you slip into accidentally.
Philip Taylor said,
March 1, 2020 @ 3:09 am
A repetive sawtooth waveform, varying in pitch by about 3.5 tones, with a period of about 1 second.
John Finkbiner said,
March 1, 2020 @ 3:20 am
I listened through built-in iPad speakers. The first time I listened (before reading the article) it sounded to me like an oscillating pitch with periodic bursts of static centered on the low extremes. I had a mental image of a point following a sine wave that sometimes intersected with an uneven floor. Maybe it was an insect flying just over the surface of a pond; when a low point in its flight hit the top of a ripple the water made it thrash about to take off again.
I listened again after finishing the article and it was obvious that the creak was actually evenly spread over the entire pitch range but for some reason the low-frequency rumbles were much more salient the first time through.
MattF said,
March 1, 2020 @ 3:28 am
Sounded to me like a beginner's attempt to play a brass wind instrument.
Bob Ladd said,
March 1, 2020 @ 3:59 am
I definitely heard periods of what sounded like creaky voice – in fact, I briefly entertained the hypothesis that this was a recording of an actual person, but I realized that the pitch oscillation and the actual how and low pitches were too regular for that.
Relevant to your interests: I heard more creakiness early in the sequence than later – not sure what that proves.
Jamie said,
March 1, 2020 @ 4:18 am
Something like a sawtooth or square wave (something with lots of harmonics) varying in frequency with occasional bursts of a similar sound at lower frequency (an octave lower?)