Real fry

« previous post | next post »

You'll search Google News in vain for stories about most technical terms in phonetics — no recent coverage of lenition, for example — but "vocal fry" has been prominent in the popular press for several years. Despite all the coverage, many people seem to be unclear about what it is and where it comes from — so today I thought I'd spend a few minutes on the phenomenon from a phonetician's perspective.

The media focus began back in 2011, in response to Lesley Wolk, Nassima B. Abdelli-Beruh, and Dianne Slavin, "Habitual Use of Vocal Fry in Young Adult Female Speakers", Journal of Voice. Wolk et al. used both perceptual and acoustic measures of "vocal fry", and found that

(1) vocal fry was used in sentence reading by more than two-thirds of this population of [34] female college students, (2) vocal fry rarely occurred in sustained vowels, (3) vocal fry occurred most often at the end of utterances, and (4) statistically significant differences were found for several acoustic measures between vocal fry and normal register.

This was widely interpreted as signaling an "epidemic" of "pathological" voice quality among young American women, though in fact the study claimed no such thing, and offered no evidence that the situation would have been different 30 or 50 or 100 years earlier. (Nor, for that matter, any evidence that the phenomenon is more common among young women than among older women, or among women than among men.)  So in "Vocal fry: 'creeping in' or 'still here'?", 12/12/2011, I took at look at the "TIMIT Acoustic-Phonetic Continuous Speech Corpus", recorded in 1986, and found vocal fry in the very first female-speaker sentence that I looked at — from a woman born in 1957 — and in many others, though I didn't try to quantify the prevalence in that collection.

But Wolk et al. unfortunately did not publish the recordings on which their paper was based, so it's hard to know whether my criterion for "vocal fry" was the same as theirs.

Vocal fry made it into the news again, due to Rindy C. Anderson et al., "Vocal Fry May Undermine the Success of Young Women in the Labor Market", PLOSOne 5/28/2014. Anderson et al. asked male and female speakers to produce sentences both with and without vocal fry, and then asked listeners which version they would be more likely to hire. The result was that the non-fry sentences were overwhelmingly preferred, with the anti-fry preference being slightly stronger in the case of female speakers.

Anderson et al. did publish the audio from their experiment, for which they (and the journal editors) are to be commended. This allows us to confirm what Christian DiCanio suggested we ought to expect ("Vocal fry probably doesn't harm your career prospects", 6/7/2014), namely that the vocal-fry version of the sentences are rather exaggerated and fake-sounding. Here are the first and second female-speaker examples:

But maybe real-life contemporary-woman vocal fry is like that? Let's take a look at some examples from Kim Kardashian, widely regarded as the beau ideal of the modern female fryer. Her interview on The Early Show certainly provides plenty of examples, e.g.


On the other hand, the show's host exhibits pretty similar patterns:

And for that matter, so does Hillary Clinton in her recent Fresh Air interview:


And just to forestall any suspicions that HRC is pandering to the young female vote, or is a secret member of the Kardashian family, or whatever, here's George W. Bush in a 2010 interview with BIll O'Reilly:


In all of the real examples — from TIMIT, from Kim Kardashian and her interviewer, from Hillary Clinton and George W. Bush — the region of period-doubling and/or erratic glottal pulses occupies about 200-500 milliseconds at the end of a intonational phrase that falls to the bottom of the speaker's range. In the (fake and fake-sounding) examples from the Anderson et al. study, the comparable period is often more than a second long, e.g.:

So why does this happen?

The real question, how is it ever possible for it not to happen?

The mathematical framework for such phenomena was discovered by Mitchell J. Feigenbaum in the late 1970s. As he explains in Universal behavior in nonlinear systems", Physica D 1983,

[S]ome very simple schemes to produce erratic numbers behave identically to some of the erratic aspects of natural phenomena. More specifically, there is now cogent evidence that the problem of how a fluid changes over from smooth to turbulent flow can be solved through its relation to the simple scheme described in this article. Other natural problems that can be treated in the same way are the behavior of a population from generation to generation and the noisiness of a large variety of mechanical, electrical, and chemical oscillators. Also, there is now evidence that various Hamiltonian systems-those subscribing to classical mechanics, such as the solar system-can come under this discipline.

The feature common to these phenomena is that, as some external parameter (temperature, for example) is varied, the behavior of the system changes from simple to erratic. More precisely, for some range of parameter values, the system exhibits an orderly periodic behavior; that is, the system's behavior reproduces itself every period of time T. Beyond this range, the behavior fails to reproduce itself after T seconds; it almost does so, but in fact it requires two intervals of T to repeat itself. That is, the period has doubled to 2T. This new periodicity remains over some range of parameter values until another critical parameter value is reached after which the behavior almost reproduces itself after 2T, but in fact, it now requires 4T for reproduction. This process of successive period doubling recurs continually (with the range of parameter values for which the period is 2nT becoming successively smaller as n increases) until, at a certain value of the parameter, it has doubled ad infinitum, so that the behavior is no longer periodic.

In fact, some oscillatory behavior starts out in the chaotic regime. In particular, if you release pressurized gas through an elastic passage which is almost but not quite able to seal off the flow, you will get a cycle in which

(1) the pressure forces the passage open,
(2) gas begins to flow through the passage,
(3) bernoulli forces pull the passage closed again, returning us to step (1).

But there's a wrinkle, well known to anyone who has tried to learn the oboe or the trumpet — unless the gas pressure and the elastic passage are very carefully regulated, the result will be something like a Bronx cheer, with irregular oscillation from the start:

The mammalian larynx probably evolved to close off the airway, partly to protect the lungs from food, drink, and vomit, and partly to allow pressurizing of the lungs to make the trunk a more rigid platform for the arms and legs. The role of the larynx in vocalization is presumably a secondary adaptation — but in any case, it takes a delicate balance in genetic design, in phenotypic development, and in learned behavioral control to produce a nearly-periodic pitched sound rather than a croak or a bark or a burp or a squeak. Not all of our mammalian relatives can manage this feat. And we ourselves don't manage it all the time.

So it's not at all surprising that as the system relaxes, it tends to exhibit period-doubling and even transition to chaos. The surprising thing is that this doesn't happen more often.

For an entrancing example of a simple system that exhibits deterministic chaos, consider the double pendulum:


  1. M.N. said,

    June 19, 2014 @ 3:07 pm

    I didn't listen to all the stimuli, but I think I listened to all the female ones…and was there a single one, even among the "normal voice" examples, *without* vocal fry? Or do I not know what vocal fry is?

    (If "vocal fry" refers to what I think it does, then the sound file labelled as "vocal fry" within each pair has *more* vocal fry — it happens on more syllables or is otherwise more notable — but the "normal" member of the pair had at least some in, I think, every case.)

    Also, the narrative in the article seems to presuppose that vocal fry is an affectation; I'm wondering what their evidence is for that. Some of the stimuli did sound "affected", but this was also true of the "normal voice" condition and seems to be the result of other factors: the first example in the post, for instance, doesn't reduce the vowel in "for". To me, most of the speakers sound like they're reading lines, rather than speaking naturally. (Even in the "normal" examples.)

    These are just the impressions I'm getting by ear, though; I'm not a phonetician.

  2. Mark Liberman said,

    June 19, 2014 @ 4:07 pm

    @M.N. — A wonderful point! You're completely correct — I didn't even listen carefully to the so-called "non-fry" examples. Now that I do, I see that most of them have got the normal dose of period-doubling and erratic glottal pulses at the end, just like Kim Kardashian and Hilary Clinton and George W. Bush. So the experiment didn't test "no vocal fry" against "exaggerated fake vocal fry", it tested "normal vocal fry" against "exaggerated fake vocal fry"!

    Here are the "normal" versions of the phrase from Speakers #1 and #2, whose fake fry versions are presented in the body of the post —

    Speaker #1:

    Speaker #2:


  3. AntC said,

    June 19, 2014 @ 6:37 pm

    @M.N. vocal fry is an affectation

    [Br.E. speaker here, I use the term 'creaky voice'.] I do associate creaky voice as an affectation of the upper classes being patronising. Or more generally of those in authority being evasive. (The British Foreign Secretary William Hague seems to be particularly prone to it, to my ears, not all of which can be attributed to his rich Yorkshire accent. Creaky voice is lampooned endlessly in British comedy — Monty Python Upper-class Twit of the Year — or the two Johns in TW3

    Since M.N. correctly points out that all of MYL's examples include some vocal fry, I guess I mean excessive creaky voice.

    MYL's explanation of the physics involved makes sense. (And I loved the pendulums!) Some creakiness is to be expected. So the research needed is: what is the expected/normal extent of creaky voice? (For various speaker communities and registers.) So then is it on the increase in certain registers?

    [(myl) Can you provide some more specific examples of "patronizing upper-class-twit creaky voice"? It sounds like an interesting topic worthy of further investigation, but I didn't notice an unusual amount of creak in the "Two Johns" skit you linked to.

    Many people treat "creaky voice" and "vocal fry" as synonyms, but I think it's worthwhile to retain the distinction between creak="really low pitch, typically set off by sudden period doubling from a region where the fundamental is twice as high", and fry="region of pitch pulses irregular in timing and amplitude, typically at the end of a low-pitched region, often following period doubling". Still, it's true that the two phenomena have related causes, as described by Feigenbaum.

    It should also be clear that "vocal fry" is sometimes genuinely a pathology, for instance caused by lesions, growths, scarring, or inflammation of the vocal cords.]

  4. David P said,

    June 19, 2014 @ 8:57 pm

    Almost any William Buckley video?

    [(myl) He certainly radiates languid condescension. But there's little or no period doubling and essentially no chaotic oscillation — rather, it's just really low F0, with peaks around 105 Hz and lows around 55 Hz:


  5. Lane said,

    June 20, 2014 @ 5:06 am

    Isn't every attempt to use language a kind of affectation? It's not an unconcious process like breathing. It's a conscious process of conveying a message, thorough style, word choice, volume, pitch, speed and a lot more. When I go on the radio and force myself to slow down and enunciate more carefully, that's an "affectation" compared to my usual overfast slight-mumble.

    It seems more likely that a lot of people's natural, least-monitored voices involve fry, and it would be an affectation to try to avoid it. (Maybe some Spanx would help?)

    [(myl) Voice quality changes are certainly a large part of the way that attitude and emotion are conveyed in speech; and there certainly are voice quality "affectations", in the sense that (for example) someone might use an especially breathy voice as their normal setting, or as the normal setting for one of their modes of self-presentation. What still isn't clear to me is whether there's really some sort of fad or fashion among young women for greater use of period-doubling and/or vocal fry at the ends of falling-pitch phrases — or perhaps some other kind of voice-quality change that sounds similar — or whether this is mostly or entirely driven by confirmation bias.]

  6. MattF said,

    June 21, 2014 @ 1:39 pm

    So, do these examples actually show non-linearity– i.e., a non-linear relation between a (periodic) driving force and an amplitude-dependent response to the driving force that's not at the driving frequency? It's possible, e.g., that the driving force contains energy at lots of frequencies and the vocal system just picks out certain resonances. That would not be a non-linear response.

    [(myl) That kind of coupling is what makes woodwinds and brass instruments work, at least in the control of skilled players. But to a first (zeroth?) approximation, the vocal source is considered to be independent of the acoustic resonances of the supra-laryngeal tract. And period-doubling transition to chaos is (1) hard to avoid in oscillatory physical systems; (2) present in simple simulations of vocal-cord dynamics; (3) a good description of what actually happens in the cases under discussion. In any case, there's excellent evidence from other sources of non-linearities in vocal-cord dynamics, for example the hyteresis in onset vs. offset of voicing as a function of degree of adduction. ]

  7. Jonathan Gress-Wright said,

    June 23, 2014 @ 3:35 am

    From an older post ( you suggested, Mark, that there is some reason to think women have greater or more exaggerated vocal fry than men. Do you think this, if true, would necessarily be socially conditioned, or are there perhaps physiological reasons to expect it?

  8. AntC said,

    June 24, 2014 @ 4:33 am

    So are we concluding that (as a productive feature of AmE vocalisation), vocal fry is not even a thing? At best it's a physiological/articulatory consequence of preceding vocalisations?

    Respecting myl's distinction that creaky voice is a different phenomenon, and not claiming to be an expert on Kardashianism, all I'd say is that in BrE, creaky voice appertains to a different social milieu.

    I'm struggling to find suitable examples as myl requests. (And I appreciate that comedy sketches hardly count as reliable corpora.) I hear several creaky examples within the first minute of this, from John Bird (playing the banker).

  9. Bloix said,

    June 24, 2014 @ 2:12 pm

    I wish I'd had a recorder on my commute on the DC metro this morning. Two twenty-something white women, apparently summer interns or students, were talking. One spoke like any educated young person in informal conversation, with a bit of creaky voice at the end of sentences and for introductory words (yah and so). But the other used creaky voice for EVERY SINGLE WORD!

  10. Joanna Cazden said,

    June 26, 2014 @ 5:26 pm

    As a speech pathologist working in the area of voice, I'm glad to see this level of thoughtfulness about "fry." A trace bit of vocal fry (however defined/ measured) at the ends of utterances is not necessarily a problem. But when it's the only sound someone can make—or comprises most of their phonation stream—it can be extremely inefficient: hard to make loud, hard to be heard over noise, etc.. It also has a limited pitch range, which comes across as emotionally flat. So typically if I'm seeing someone with a lot of fry, their "presenting complaint" is that throat feels tired or tight all the time. Improvements in breath support and pitch variety help a lot— but only if person is willing to sound more expressive (less "cool"). I suspect that this component or signal of emotional disengagement is the true "turn-off" in job interviews, especially when combined with the mechanically related body-language corollaries of very squeezed midsection & shallow breathing. Presentation of voice is not just in the larynx: whole body contributes and is "heard," albeit subconsciously.

  11. Tracy said,

    June 30, 2014 @ 6:27 pm

    Thank you for posting actual examples — I wasn't sure what was meant by vocal fry, and didn't trust other sources to get it right.

    There is a problem, though.

    Now that I've learned what it is, I actually notice it. And I can't unlearn how to hear it. :P

RSS feed for comments on this post