Sarah Koenig

« previous post | next post »

Following up on our recent Vocal Fry discussion ("Freedom Fries"; "You want fries with that?"), Brett Reynolds wrote to suggest that "Sarah Koenig's vocal fry seems to be something new". As evidence, he suggested a contrast between a piece she did in 2000 ("Deal Of A Lifetime", This American Life #162, 6/23/2000) and one from 2014 ("The Alibi: Prologue", This American Life #537, 10/3/2014). Here are the opening passages from those two segments, along with another one from 2000 ("The Mask Behind The Mask", This American Life #151, 1/28/2000), her first for This American Life:

TAL #151
TAL #162
TAL #537

A quick listen suggests that Brett is right, at least about these particular passages.

We'd need a lot more evidence before concluding that this is really a difference between Sarah Koenig 2014 and Sarah Koenig 2000, rather than a difference between her presentation style in a couple of particular episodes. And if there really is a systematic difference across time, we'd need to look for evidence to help us decide whether it's due to the normal processes of aging, or to her participation a general stylistic trend, or to the specific influence of Ira Glass.

Still, this is the first faint glimmer of evidence that I've ever seen to suggest that there might be some sort of stylistic change in progress. And it suggests that we could learn more, by tracking recordings of radio personalities over time.

But in this post, I want to do something different, namely to explore a way to quantify the vocal characteristics that we hear as creak or fry. And it happens that just yesterday, David Talkin connected me to the alpha release of a great new open-source pitch and epoch tracker. David was the author of the previous gold-standard method, but the new program is really much more reliable than anything I've ever seen before. (And also much faster!) As soon as David is ready to go public, I'll post a more complete review, along with a pointer to the source code.

So I applied this new software (evocatively named "reaper") to the three clips linked above — 41.79 seconds, 4776 f0 estimates from 2000, and 60.6 seconds, 7248 f0 estimates from 2014 — and made a histogram comparing the results.

In the plot below, the distribution of f0 values from 2000 is in blue, while the distribution from 2014 is in red. The overlap of the two distributions is purple.

As you can see, the overall range of values is similar in the two samples, but in 2014, Ms. Koenig is using the lower part of her range (100-150 Hz) more extensively; and is also dropping into a period-doubling register (40-80 Hz) to a much greater extent.

The same data plotted in terms of pitch-period intervals (what the tracker is really estimating) — here longer periods correspond to lower frequencies:


It seems clear that the differences in question can be quantified — so now all we need is more data!

[The concentration of values around 60 Hz in the 2000 recordings is suspicious, and might reflect some AC hum — though I didn't hear any. When I get a chance, I'll check it out further…]




  1. Alicia said,

    February 5, 2015 @ 9:20 pm

    If vocal fry IS genuinely a new trend among Young People Today, might it be an over-correct for up talk? Young people wanting to be taken seriously are phobicly avoiding the upper part of their register, especially at the end of sentences?

    [(myl) If these few examples are typical, then the instances of creak and fry are a natural consequence of greater use of the lower part of the vocal register. This might be connected with an attempt to sound more authoritative on a male model, independent of any concern about uptalk. But there might be other things going on as well, such as the effects of using a more intimate register. Or it might just be a style. Or (in this case) just the effects of getting older.]

    I noticed several sentences worth of uptalk in the This American Life episode, but only in one brief section. There wasn't any in the two 2000 Koenig clips we are given, but then they're short. I leave it to my elders and betters to decide if it would be worth comparing a larger sample set to see if vocal fry and uptalk are in any kind of competitive relationship.
    I just made the extreme sacrifice of watching a five minute video of famously-frying Kim Kardashian heard no uptalk, and she's the kind of personality I would stereotypically expect it from, saying the kinds of vapid things I expect to be up talked. To the very limited extent that that constitutes data.

  2. Mike Sullivan said,

    February 6, 2015 @ 2:54 am

    I found it instructive to listen to (and view, in Adobe Audition) all three recordings at greatly reduced speeds. In each case, there was a basic vocal cord vibration at the rate of about 3 per 1/100 second for the 2000 recordings, slowing to about 2.5 per 1/100 second in the later recording, or 33-25 Hz. The vocal pattern at these low frequencies is very easily discernable, and is very similar at the different frequencies, so I wonder how it is that we don't sense the vocal pattern in higher frequency speakers but do sense it in lower frequency speakers.

    [(myl) It's not clear to me what you're hearing (and looking at), but it's not the fundamental frequency of Sarah Koenig's voice, since "3 per 1/100 second" corresponds to a frequency of 300 Hz, or a period of .0033 seconds (3.3 milliseconds), and you can see from the histograms that this is far away from the modal region of her f0 (roughly double the frequency, or half the period). Could you show us a picture of what you see, or quantify what you hear in some other way?]

  3. BlueLoom said,

    February 6, 2015 @ 8:29 am

    I have zero qualifications in this field, but I would expect the voice of a 45-year-old woman to be deeper than that of a 31-year-old woman.

    [(myl) This might be true. But such evidence as there is doesn't support this view over all:


  4. chh said,

    February 6, 2015 @ 1:07 pm

    Mark, can you give any tips on getting audio analysis software to identify regions with uneven glottal pulsing?

    It looks to me like pitch-tracking algorithms won't give you an F0 estimate for a lot of regions where vocal fry really is happening, but I'm not sure how to automatically distinguish those erratic-period vocal fry parts from parts with no phonation.

    [(myl) An excellent question, for which I have an answer that won't fit in the response to a comment — but I'll try to post a discussion and some code at some point within the next few days.]

  5. chh said,

    February 6, 2015 @ 1:15 pm

    I'm wondering whether you did this, or thought it was necessary to.

  6. Ernie in Berkeley said,

    February 6, 2015 @ 1:43 pm

    I wonder whether the clips were recorded with the same equipment, especially the microphone and digital vs analog tape. I first attributed vocal fry, female especially, to newer microphones that captured more of the lower registers.

    [(myl) No, the differences shown in the histograms would not be affected by any plausible differences in recording conditions. This is not a difference in spectral balance or other quasi-linear transfer-function differences — it's a difference in the range of periods in the oscillation of the vocal cords.]

  7. Alicia said,

    February 6, 2015 @ 8:07 pm

    Thank you for responding to my comment.
    I thought the Language Log position on vocal fry was that (contra stereotype) both young men and young women are frying, just as both young men and young women up talk(ed).
    Are you arguing that young men are… imitating women imitating men? Independently decided to exaggerate the maleness of their voice in the same way and at the same time that young women did?

    [(myl) I don't think there's any evidence that the incidence of (the behavior that people perhaps carelessly call) "vocal fry" has changed. Nor is there any evidence that it hasn't changed — there's nothing on either side, as far as I know, except intuitions and anecdotes.

    In the particular care under discussion, there is clearly a difference between the two clips from Sarah Koenig in 2000, and the clip from Sarah Koenig in 2014. Part of the difference involves the very low (and sometimes irregular) pitches, around 40-80 Hz, that are usually called "fry" — that's this part of the picture:

    But I was interested to see that the 2014 clip also shows much more extensive use of the normal lower end of her register, in the 110-150 Hz range:

    Thus in this particular case, it's plausible that the increase in period-doubling "fry" or "creak" is a natural consequence of greater use of the lower end of her non-fry register.

    For people without vocal pathology, the period-doubling "fry" or "creak" register is something that happens naturally in lower-amplitude and lower-pitched regions, especially at the ends of breath groups.

    If Sarah Koenig were trying to make more use of lower fundamental frequencies (while still using higher frequencies in other parts of her phrases), this is what you'd expect it to look like. A male speaker might make the same choice. Increased use of the lower part of the vocal range might be associated with projecting authority, but it might also be associated with projecting intimacy. Or conceivably it might just be a stylistic choice, without any signification other than itself.

    If there's a stylistic trend, it would mean that more people of all genders are making this kind of choice, for whatever reason.

    The problem with all of these discussions is that we're weighing alternative explanations in an almost complete absence of relevant facts. This post was a very small effort in the direction of showing what some relevant facts might look like.]

  8. Mimi said,

    February 8, 2015 @ 12:44 am

    I can't get the audio recordings above in the post to play. They say "Error".

    [(myl) What kind of device? What operating system? What browser?]

  9. Mimi said,

    February 8, 2015 @ 12:45 pm

    Hi, they do not work on Internet Explorer 11 for Windows 7 Home Premium for an ACER Aspire desktop with all the updates installed. I tried the url on Mozilla Firefox, and the sound files worked perfectly! Must be something with Internet Explorer.

    [(myl) I thought that all browsers would play .wav files in html5 audio elements, but apparently IE only accepts mp3 and aac. As a result, there is amazingly enough no audio format that all browsers will accept — though .wav is closest to being universal, since all browsers except for IE will play it. I can't imagine why Microsoft made this exceptionally stupid decision…]

RSS feed for comments on this post