Language Log

Pitch contour perception

August 28, 2017 @ 4:39 am · Filed by Mark Liberman under Phonetics and phonology, Prosody

Listen to this brief four-syllable phrase, and answer a simple question:

once the eggs hatch

Is the end of the last sylllable ("hatch") higher or lower in pitch than the start of the first sylllable ("once")?

If you're like most people, you hear "hatch" as ending somewhat higher than than the syllables that precede it, though it's also a little rough in voice quality.

But in purely physical terms, the syllable "once" begins at about 200 Hz (= cycles per second) and ends at about 240 Hz, while the syllable "hatch" starts at about 100 Hz and ends at about 121 Hz, a full octave lower:

Thus an f0 track of the whole thing looks like this:

This is a good example of one of the reasons that (the psychological dimension of) pitch is not the same as the (physical dimenion of) fundamental frequency.

So what's going on here? Why does it sound (to most people) like "hatch" is a little higher than "once"?

In acoustic terms, we're hearing the second, fourth, sixth, … harmonics of "hatch" as a continuation of the first, second, third, … harmonics of "once". In the spectrogram below, I've outlined the third harmonic of "once" and the sixth harmonic of "hatch":

In articulatory terms, the octave shift in the last syllable (and also in part of the syllable "egg") is an example of something that's a feature of many oscillatory systems, which can undergo period doubling bifurcations as a natural consequence of the process that causes them to oscillate in the first place. This happens not only to speakers but also to wind, reed, and brass players, and also happens in many processes that don't involve humans at all.

On the perceptual side, this is related to the illusions known as Shepard Tones or Shepard-Risset Glissandos, which are sort of auditory barber-pole illusions that also depend on octave-related ambiguities.

In the example phrase "once the eggs hatch", everything works out as it should, because what we hear is pretty much what the speaker intended.

But of course there are perceptual octave ambiguities in cases where there's no speaker intent to decode. Here's a tone glide that starts at 200 Hz and ends at 140 Hz, but sounds to most people as if it's rising throughout:

It blends a higher rise from 200 to 280 Hz with a lower rise from 100 to 140 Hz:

200-280	100-140

They're combined so that the first third is entirely the higher glide, and the last third is entirely the lower glide, while the middle third is a gradually shifting blend.

In the spectral domain, the blend looks like this:

The audio clip that we started this post with — "once the eggs hatch" — comes from an interview with Mary Gardiner, author of Good Garden Bugs, broadcast recently on You Bet Your Garden.

Here's the passage in context:

With wasp parasitoids, they have what looks like a stinger, but is actually an ovipositor, or egg-laying organ, and so that ovipositor allows them to sting their prey and deposit an egg within it.

Some wasps or flies will lay their eggs on their prey, and then the larvae will hatch from those eggs and enter the insect, and some parasitoids even lay their eggs directly on plant material, hoping that their host will consume the eggs during feeding, and some of the undamaged eggs then hatch once they’re inside the pest.

Once the eggs hatch, one or more larvae will emerge per pest, and those larvae will consume the pest from the inside out, in an alien-like way, and then they will pupate either inside the host, or outside, sometimes on it, and then emerge as adults.

August 28, 2017 @ 4:39 am · Filed by Mark Liberman under Phonetics and phonology, Prosody

Permalink

13 Comments

stephen said,

August 28, 2017 @ 12:43 pm

What if somebody testifies under oath about the pitch and the frequency is shown to be different from what the person said and believed?

[(myl) This suggests that you know at least one example of psychophysical testimony under oath — please tell us about it.]
Terry Hunt said,

August 28, 2017 @ 7:47 pm

Might it be the case that subjective (and correct) perceptions of actual, intended pitches in human speech have at some point been seemingly contradicted by objective mechanical measurements of fundamental frequencies?

(As a non-linguist, I have no idea whether or not this is the case, but I can imagine some observers being tripped up by the phenomenon, and I presume Prof. Liberman may have some linguistically practical reason for raising the topic which is not obvious to me as a layman.)
Brett said,

August 28, 2017 @ 7:55 pm

In nonlinear oscillating systems, period doubling tends to occur easily when the system is forced hard enough. This superficially appears to lower the fundamental frequency. However, it is quite possible to have relatively little energy in the new lowest frequency; as in this case, most of the energy remains in the old f0.

The right way to think of this situation is not to treat the doubled period as giving a new fundamental, atop which the old fundamental is an overtone. Rather, the fundamental remains unchanged, but now there is an additional "undertone" beneath it.
D.O. said,

August 28, 2017 @ 9:17 pm

Brett, maybe (and physicists and engineers surely spent a lot of mental energy on trying to figure it out), but the change of fundamental means that now we have overtones of the type nf_0/2, with odd n representing new harmonics. Do you contend that all of them are suppressed?
Michael Proctor said,

August 28, 2017 @ 9:59 pm

Fascinating. I thought the perception of relatively higher pitch might be influenced by the extreme phrase-final lengthening on "hatch", but the effect is robust even if you truncate the audio a fraction of the way through the [æː].
Andrew Usher said,

August 29, 2017 @ 1:28 am

I agree with Brett; that is clearly seen in the spectrogram. Regardless of how you choose to describe it, the increase in pitch we hear is no illusion. In fact, all phenomena of this type are probably due to harmonics; I doubt pure tones could confuse the ear no matter how presented.

As a side comment, the vowel she has in 'hatch' isn't [æː] but [ɛə], an obvious indicator of regional accent. (Before a _voiced_ consonant it would be less so, and before a nasal not at all – in the US, that is.)
JPL said,

August 29, 2017 @ 2:30 am

I answered the question by reproducing the sound of the clip (I'm pretty good at reproducing exactly the intonation of a given speech sample, having been taught rigorously to do so by Kenneth Pike) and when I did so it was clear to me that the pitch on "hatch" was lower. The pattern reminded me of an example of question intonation I gave in Pike's class as a proposed counterexample to his account of question intonation, where the pitch on the final syllable was lower (and level in the same way) than the pitch of the syllable of the word given primary information focus. (The sentence, in African- American dialect, was "Have you lost your mind?" We made a spectrogram of it.)
unekdoud said,

August 29, 2017 @ 7:18 am

I too tried to truncate the audio to the period-doubled portion, and it took quite some effort to hear the lower pitch, which certainly would be unnatural to find alone in any speech, and even then it still sounds like a combination (chord) of two pitches, as in the middle of the artificial glide-blend example.

Other than the low energy at 100Hz (lowest line), the spectrogram doesn't show any other kind of spectral anomaly, which does make me wonder if that portion of speech could occur naturally and stably without period-doubling effects.
Daniel Deutsch said,

August 29, 2017 @ 7:38 am

Not sure if it is true, but I have heard that orchestral musicians sometimes test the ear of new conductors by playing sections an octave higher than notated.
Brett said,

August 29, 2017 @ 8:06 am

@D.O.: There is generally a threshold point at which the forcing of nonlinear system becomes strong enough for period doubling to occur. Above the threshold, the amount of energy in the new odd harmonics increases continuously, starting from zero right at threshold. So not too far above threshold, all these new harmonics are indeed suppressed.

Well above the bifurcation threshold, the energy in the new harmonics rises and can become comparable to the energy in the old harmonics. However, there is no general pattern of behavior in this regime, since higher-order nonlinearities also come into play. There are additional doublings (or triplings, or…) of the period, introducing new harmonics. Typically, there is a final very high forcing threshold above which the motion become chaotic and entirely aperiodic.
stephen said,

August 29, 2017 @ 9:10 pm

I'm sorry, but I don't know of any examples of psychophysical testimonies.
Eric P Smith said,

August 30, 2017 @ 7:03 pm

Surely this is the phenomenon generally known as "vocal fry" or "creaky voice", discussed many times on Language Log. Are these terms now deprecated in favour of "period doubling bifurcations" as some form of political correctness?

[(myl) Nothing to do with politics, correct or otherwise.

"Creaky voice" is speech with a fundamental period long enough (and perhaps other characteristics appropriately set) so that the flicker-fusion threshold is approached, and individual pitch pulses can to some extent be perceptually resolved. "Vocal fry" is speech where the pitch pulses are spaced quasi-randomly, like the pops of water vaporizing in hot oil — vocal fry typically but not always happens in low-pitched regions of speech.

"Period doubling bifurcations" are a general property of certain kinds of periodic dynamical systems (or iterative maps in the discrete case), where gradual changes in some parameter lead to abrupt doubling (or halving) of the fundamental period, and eventually to the replacement of periodicity by chaotic oscillation.

The human vocal folds are among the many dynamical systems that exhibit behavior of this type, and both creak and fry are often associated with period doubling. But creak and fry are basically descriptive terms for types of kinematic observations, whereas period-doubling bifurcation is a term from the mathematics of dynamical systems.]
Antoin said,

September 5, 2017 @ 3:50 pm

Talking of creak… It's interesting that the irregular creak in "eggs" (with the period doubling) sounds low while the very regular creak in "hatched" doesn't. I wonder if irregularly-pulsed creak might be associated with low pitch targets in a way that the the more regularly-pulsed creak (vocal fry) is not.

RSS feed for comments on this post

Pitch contour perception

13 Comments

stephen said,

Terry Hunt said,

Brett said,

D.O. said,

Michael Proctor said,

Andrew Usher said,

JPL said,

unekdoud said,

Daniel Deutsch said,

Brett said,

stephen said,

Eric P Smith said,

Antoin said,

Follow us on Twitter

Archives [+/–]

Blogroll [+/–]

Meta