Language Log

What do you hear?

March 1, 2020 @ 2:49 am · Filed by Mark Liberman under Computational linguistics, Psychology of language

Listen to this sound, and describe it in the comments below:

You can learn what the sound is, and why I care how you hear it, after the fold.

I started with 15 seconds of sound, whose fundamental frequency oscillates up and down with a period of one second, in two versions an octave apart:

200-240 Hz	100-120 Hz

I then divide the 15 seconds into random chunks, and choose either the higher-frequency version or the lower-frequency version for each chunk, with probability of 0.8 for the higher-frequency version. The switch between versions is done smoothly, to avoid acoustic discontinuities.

Here's what a sample looks like in the time and frequency domains — you can see the period-doubling in the waveform (the upper panel), and the corresponding frequency-halving in the spectrogram (the lower panel):

What's the point?

The phenomenon of "vocal creak" involves a similar abrupt period-doubling or halving — for fundamental reasons discussed here — which we mostly perceive as a change in voice quality rather than a factor-of-two change in pitch.

There are various reasons to want estimates of a speaker's amount (and type and location) of creak. And we also often want estimates of a speaker's "pitch range", although straightforward distribution-based measures will be strongly influenced by that speaker's amount of period-doubling and halving.

There are ideas Out There about how to calculate a "creak index", and how to estimate pitch range in a creak-resistant way — see "The great creak-off of 1969", 7/28/2015, for a simple example — but we don't really know how to validate such measures.

One approach is to start with fake data, like the signal at the top of this post, where we know exactly what the control parameters were. (And of course we could make such signals in various ways that sound more speech-like and/or more pleasant.)

If an analysis method can't recover the underlying parameters from such signals, it probably won't work on real speech. If a method works with a variety of synthetic signals, then we can be more interested in its performance on real-world data.

The particular fake data at the start of this post is not very speech-like: the overtone amplitudes are uniformly 1/F rather than modulated by vocal-tract resonances; the period-doubling happens at random times rather than preferentially in lower-pitched regions; the period-doubling transition is handled by mixing in the output domain rather than by modeling a voice-source generation process; the sinusoidal modulation of pitch creates a distribution of values that's very different from what we typically see in speech.

But to my ear, the result meets one basic condition: it doesn't sound like switching between two different sources (though that's what it is), but rather like a single source varying in timbre (or what we would call "voice quality" if it were a human voice). I'm curious to learn whether other listeners agree.

There's also an acoustic perception angle, connected to the Shepard-Risset glissando illusion — but that's a topic for another day.

Update — here's a version in which 50% of the lower-pitch signal is mixed in, rather than 100%:

To my ear, this strengthens the perception of a timbre difference rather than a pitch difference.

March 1, 2020 @ 2:49 am · Filed by Mark Liberman under Computational linguistics, Psychology of language

Permalink

41 Comments

Yerushalmi said,

March 1, 2020 @ 3:06 am

It put me in mind of the recorders we all got in elementary school. When you were just starting to learn it, you blew into it without thinking too much about it, and it would sound awful – but then every so often you'd feel the air "catch" in the recorder and it would sound the way it was supposed to. As you got better at it, you'd learn how to make sure the air got "caught" and stayed there…. but every so often you'd slip and get a momentary screech before you recovered yourself.

However, if I recall correctly, in the recorder the intended sound was lower-pitched than the awful screeching; here it sounds more like the intended sound is the higher-pitched one, and the lower-pitched one is the one you slip into accidentally.
Philip Taylor said,

March 1, 2020 @ 3:09 am

A repetive sawtooth waveform, varying in pitch by about 3.5 tones, with a period of about 1 second.

[(myl) The basic waveform, 30 summed sinusoidal overtones with 0 phase and amplitudes of 1/F, is indeed sawtooth-like:

The period of pitch variation is exactly 1 Hz.
The amount of pitch variation is 12*log2(120/100) = 3.1564 semitones.
]
John Finkbiner said,

March 1, 2020 @ 3:20 am

I listened through built-in iPad speakers. The first time I listened (before reading the article) it sounded to me like an oscillating pitch with periodic bursts of static centered on the low extremes. I had a mental image of a point following a sine wave that sometimes intersected with an uneven floor. Maybe it was an insect flying just over the surface of a pond; when a low point in its flight hit the top of a ripple the water made it thrash about to take off again.

I listened again after finishing the article and it was obvious that the creak was actually evenly spread over the entire pitch range but for some reason the low-frequency rumbles were much more salient the first time through.
MattF said,

March 1, 2020 @ 3:28 am

Sounded to me like a beginner's attempt to play a brass wind instrument.
n99 said,

March 1, 2020 @ 3:38 am

A fast siren with intermittent fart like noises?
JimH said,

March 1, 2020 @ 3:39 am

In the first instant, it sounded like someone who doubts someone else, expressing it vocally.
Bob Ladd said,

March 1, 2020 @ 3:59 am

I definitely heard periods of what sounded like creaky voice – in fact, I briefly entertained the hypothesis that this was a recording of an actual person, but I realized that the pitch oscillation and the actual how and low pitches were too regular for that.

Relevant to your interests: I heard more creakiness early in the sequence than later – not sure what that proves.
Jamie said,

March 1, 2020 @ 4:18 am

Something like a sawtooth or square wave (something with lots of harmonics) varying in frequency with occasional bursts of a similar sound at lower frequency (an octave lower?)
Yerushalmi said,

March 1, 2020 @ 4:25 am

@MattF
Great minds :)
Mark Meckes said,

March 1, 2020 @ 4:50 am

Similar to MattF, it sounded to me like someone playing with a cheap music synthesizer's imitation of a trombone or other brass instrument.
Jonathan said,

March 1, 2020 @ 5:15 am

Posting after only listening and having read no further: it's a robot fart that goes on for a surprisingly long time, and the robot decides to have some fun by modulating it to sound like "waah-waah-waah……".
Trogluddite said,

March 1, 2020 @ 5:17 am

A waveform with high harmonic content (likely a sawtooth) periodically modulated in pitch by a sine wave, interrupted occasionally by a similar waveform at a constant pitch (and which does not disturb the period/phase of the modulating sine of the other). I perceive the static pitch as being lower than the lowest pitch of the modulated one, though I suspect this may not actually be the case.

For context – one of my hobbies is coding audio DSP effects and synthesisers, so I am rather accustomed to analysing sounds in such technical terms. More subjectively – an unnaturally large flying insect alternating between buzzing about and landing.
Trogluddite said,

March 1, 2020 @ 5:24 am

Interesting – my pitch perception was certainly not accurate!
Jonathan said,

March 1, 2020 @ 5:26 am

Having read the article now and gone back and re-listened, I think I found the higher pitched sound so irritating that I didn't really notice when it dropped low. Sorry, not much use for answering your question.

(I also found myself waiting for the drop.)

[(myl) It's interesting how annoyingly whiny this mechanical sound manages to be…]
Philip Taylor said,

March 1, 2020 @ 5:28 am

And I over-estimated by one full tone — I mentally mapped it to the range C-F, and completely forgot to start counting tones at D rather than C.
Gabriel Ramos-Fernandez said,

March 1, 2020 @ 5:40 am

This is an oscilating tone with a cutoff at lower frequencies, below which the tone lowers in frequency, becoming noisier abruptly and then returning to the higher frequency range as it goes up again. Reminds me of the spider monkey's whinny! ;)
David L said,

March 1, 2020 @ 5:58 am

Over-driven humming with intermittent farting
Brian said,

March 1, 2020 @ 6:11 am

"An Atari 400 that has short-circuited and possibly melted."
Roy Sablosky said,

March 1, 2020 @ 6:38 am

Before reading the article: I hear a strident, very annoying tone in the vocal range that wobbles a little in pitch (perhaps by a semitone). A few times it briefly (for maybe half a second) drops down an octave.
A1987dM said,

March 1, 2020 @ 6:52 am

Particularly bad electronic dance music.
Jay Sekora said,

March 1, 2020 @ 7:35 am

I was pretty incorrect! Here’s what I typed before reading “below the fold”:

It’s a constant buzz varying regularly in pitch in something like a sine wave, with an intermittent lower-pitched buzz overlaid on it. The intermittent buzz appears to vary in pitch, but I think it’s actually constant pitch and its apparent variance in pitch is an illusion due to contrast with the constant buzz and its pitch variance.
Tim Leonard said,

March 1, 2020 @ 8:04 am

A single-frequency buzz (maybe a square wave?), varying frequency sinusoidally over about a musical third, but replaced for short periods at unpredictable times by a similar buzz an octave lower. (After reading the explanation, I see that the lower-pitch tone was also varying frequency. I thought it was constant frequency.)
Cirk R. Bejnar said,

March 1, 2020 @ 8:20 am

It sounds like a sound that would be hard to manufacture or sustain, especially for a full 15 sec, with the human vocal tract. My first impressions were flatulence or a whiny electrical motor but I finally settled on a steady tone maintained on a kazoo or similar instrument.
Michael D Sullivan said,

March 1, 2020 @ 10:17 am

It sounds like a kazoo, or maybe one of those "throw your voice" things you put under your tongue back in the '50s.
Julian said,

March 1, 2020 @ 10:27 am

Before reading below the fold: I hear a siren-like noise varying from about C4 to A3 (but not sure of the octave), irregularly interrupted by a vibration like that of running a nail over the teeth of a comb. Whether the siren is continuous and the nail-on-comb is added to it, or whether it's a single source that suddenly drops below the threshold of hearing a musical note (I believe that's about 20 hertz?) I couldn't say.
A. said,

March 1, 2020 @ 10:48 am

A synthesized tone with a harsh timbre (sounds like there's a significant sawtooth wave component), oscillating between two pitches about a minor third apart, and occasionally dropping down to a lower octave for a moment.
K. Gordon said,

March 1, 2020 @ 1:15 pm

To me it's reminiscent of the sound of a winged insect flying around one's head, but as if it was made by a synthesizer.
Jerry Friedman said,

March 1, 2020 @ 3:28 pm

Something with a motor being turned faster and slower, or a kazoo? Or the mating flight of a giant wasp.
David D ROBERTSON said,

March 1, 2020 @ 5:01 pm

The Otamatone.
Twill said,

March 1, 2020 @ 6:03 pm

My first guess was that it represented the stereotypical voice breaking of an adolescent, which is generally perceived as a fluctuation between different frequencies rather than a question of timbre.
Andreas Johansson said,

March 1, 2020 @ 7:33 pm

My first impression was of an ailing electrical motor.
RolyH said,

March 1, 2020 @ 9:55 pm

Before looking below the fold it sounds like someone playing with an old Moog Synthesizer
Martin Barry said,

March 2, 2020 @ 12:47 am

'Synthetic creaky phonation' immediately sprang to mind. I hear the effect more strongly when the lower-pitched segments are at 100% than at 50% amplitude – in the latter condition I hear the higher pitched signal as continuous rather than interrupted by the 'creaky' bits, which to my ear is less like real speech.
RachelP said,

March 2, 2020 @ 3:41 am

@Yerushalmi
Interesting comment about learning the recorder, I had forgotten that expereince but you describe it well. Thing is, I am a voice student (as in singing, only amateur and not very advanced) and I have the same experience with my voice. I wonder now if that relates to vocal 'creak'.

When trying to sing a phrase, you will try and hold the inside of the mouth in one position and use lips to form the words and air flow to vary the pitch and volume. But get it wrong and the result can be a 'creak', sometimes a creaky version of the pitch you were aiming for, and sometimes just a vague lower pitched noise. Can't really do it on purpose, though, so hard to check.
Luke said,

March 2, 2020 @ 6:21 am

A buzzy tone alternating somewhere between a G and Bb, occasionally interrupted by something an octave lower doing the same.
tsts said,

March 2, 2020 @ 8:17 am

In the first 2 seconds, it sounded to me like a human voice making the noise. Then later, it sounded like an ailing motor, as Andreas Johansson pointed out. Maybe a drill or electric screw driver or hand mixer that is stuck.
Barbara Phillips Long said,

March 2, 2020 @ 2:59 pm

The sound fluctuated like a siren but sounded artificial. I enjoy early music, and I was reminded of the crumhorn. I don’t hear the resemblance to brass instruments— it sounded very reedy to me.

https://www.youtube.com/watch?v=q9wPZGKvtLY

Comparing it to a fart also sounded plausible. I don’t know if the professional farter Joseph Pujol, Le Petomane, ever recorded an equivalent sound, but some come close. Compare “the dressmaker” and “the monk”:

https://www.youtube.com/watch?v=evwLzR57wsc
Chas Belov said,

March 2, 2020 @ 8:14 pm

A repeated rising and falling high sound occasionally modulated to sound like someone blowing through a trumpet mouthpiece.
Derwin McGeary said,

March 3, 2020 @ 3:13 am

Almost like a rotary saw occasionally catching on some harder material as it goes through, but otherwise being pushed methodically through some hard wood. But it's not quite "rough" enough.
Sarah said,

March 3, 2020 @ 7:51 pm

Another thought: there is an order component, or at least a context component. In my lab we're trying to compare pitch consistency on repeated tokens of the same word, and some of those utterances are beset by creak. And they don't sound like a pitch change but like a timbre change–unless you start playback at/after the creak begins, when it clearly sounds like a Sith lord.

It reminds me of a phenomenon related to Shepard tones (tones with ambiguous pitch height but clear pitch chroma). If you compose a melody out of them, the pitch height of a particular note seems to be contingent on the pitches around it, in a good-continuation sort of way.
Jeffrey Morris said,

March 12, 2020 @ 5:08 pm

What I hear initially sounds like a single waveform that distorts periodically. It reminds me of when I would play guitar and use what is called an octave(r) pedal. The pedal is supposed to recreate the original input signal and replicate it but at the desired pitch, whether it be an exact octave (8th), fifth (5th), etc. Some pedals are better at tracking than others and, in large part, the integrity of the sound was dependent on the quality of the input into the pedal. If the input was too 'hot' this would cause the pedal to (insert your interpretive word to describe the phenomenon), something that was most often undesirable. I would describe the phenomenon as a 'blurp', 'glitch' or synthy type of distortion.

Where this begins to make sense to me is that, if you listen to a singer, when they sing at a lower volume, their voice will not distort as easily or at all because the input energy does not generate nearly enough to distort the output. If a singer sings more loudly or higher in pitch, more energy is being created which results in a hotter out put causing vocal distortion.

I realize that 'voice creak' can happen as normal volume, which makes me think that it has less to do with the energy and more to do with the laryngeal articulation.

I could keep going and give examples about the blending of clean and dirty signals (music production) or vocal exemplars of this, like David Lee Roth's unique singing technique. I don't know if any of this pertains to the above content or if I am making some significant leaps to assign some correlation. Thoughts?

RSS feed for comments on this post

What do you hear?

41 Comments

Yerushalmi said,

Philip Taylor said,

John Finkbiner said,

MattF said,

n99 said,

JimH said,

Bob Ladd said,

Jamie said,

Yerushalmi said,

Mark Meckes said,

Jonathan said,

Trogluddite said,

Trogluddite said,

Jonathan said,

Philip Taylor said,

Gabriel Ramos-Fernandez said,

David L said,

Brian said,

Roy Sablosky said,

A1987dM said,

Jay Sekora said,

Tim Leonard said,

Cirk R. Bejnar said,

Michael D Sullivan said,

Julian said,

A. said,

K. Gordon said,

Jerry Friedman said,

David D ROBERTSON said,

Twill said,

Andreas Johansson said,

RolyH said,

Martin Barry said,

RachelP said,

Luke said,

tsts said,

Barbara Phillips Long said,

Chas Belov said,

Derwin McGeary said,

Sarah said,

Jeffrey Morris said,

Follow us on Twitter

Archives [+/–]

Blogroll [+/–]

Meta