More Pinker peace creak

« previous post | next post »

Yesterday ("Pinker peace creak") I followed up on Breffni's reference to vocal fry/creak  in the speech of the young woman who introduces Steven Pinker's talk at the 2015 Nobel Peace Prize Forum. And indeed, in her first 40 words (16 seconds of audio, 8.3 seconds of voiced speech, 1,653 f0 estimate) I found three clear examples of phrase-final period-doubling.

But then, for a bit of balance, I took a look at the start of Pinker's talk — and found three clear examples of phrase-final period doubling in his first 21 words (12 seconds of audio, 5.2 seconds of voiced speech, 1048 f0 estimates).

Since the introducer does seem to exhibit the period-doubling phenomenon in a more striking way, I ended by wondering what the source of this perceptual difference is. But instead, I should have looked at a little more data, which would have clarified the situation, and suggested a way forward.

So for this morning's Breakfast Experiment™, I analyzed the whole introduction (110.3 seconds of audio, 69 seconds of voiced speech, 13485 f0 estimates) and the few minutes of Pinker's speech (221.3 seconds of audio, 147 seconds of voiced speech, 29399 f0 estimates).

And now if we compare the quantiles of estimated f0, we see a clear indication of the introducer's bimodal distribution, caused by a large number of regions where period-doubling sets the pitch an octave (or even two octaves) below her modal range:

A few months ago, ("Sarah Koenig", 2/5/2015), I suggested using the extent of such bimodality as a way of quantifying the amount of period-doubling. We can see how this would work by looking at the empirical distribution of f0 estimates in the whole Nobel Forum introduction:

Or on a semitone scale (relative to A 27.5):

To turn this picture into a single number, we could estimate how much of the distribution is found in the modal part (roughly above 27 in the semitone plot) versus the lower-pitched regions. In this case, about 23.4% of the total is in the lower region of the distribution.

[Wonkish aside — There's a serious potential problem due to variation in the tolerance of pitch trackers for octave jumps — we'd want to be careful about the source of period estimates, and check the results using other methods; or maybe use a softer correlogram output rather than a hard period estimate…]

Here's the same plot for Pinker:

And his semitone version:

It's easy to see that he has plenty of lower-pitched regions, with a hint of modes corresponding to one- and two-octave shifts — but the overall proportion of f0 estimates from those lower-pitched regions is much less than we saw in the case of the introduction. If we take the boundary to be about 15 on the semitone plot, then about 6.6% of Pinker's f0 estimates are in the lower region, compare to 23.4% for the introduction.

Here are the histograms superimposed:

Given an objective and reliable way to quantify the proportion of non-modal f0 estimates in a sample of speech, applied to appropriate published speech datasets that vary over space and time and culture, we could finally bring a significant amount of non-anecdotal evidence to bear on the issue.

Of course, it's fair to ask why and whether we should bother. Who cares, or should care, in the end? In my opinion, the main motivation comes from the intensity of media attention — apparently millions of people care about this issue, whether they should or not, and it's a bad idea to leave the field entirely to stereotype and anecdote in cases like this. And who knows, we might even learn something interesting…

[For those who want to check what I did in this post, or redo it in another way, here are links to the audio samples I used and to the pitch-tracker outputs: Introduction audio, f0; Pinker audio, f0. I got f0 estimates from a version of esps get_f0, with minimum f0 of 20 and maximum of 400, and frame advance of 0.005.]

[As an illustration of the concern about pitch tracker effects expressed earlier, if I run exactly the same scripts with the same parameters using REAPER as the source of f0 estimates, I get 21.2% for the introduction and 14.4% for the Pinker sample — compared to 23.4% and 6.6% using get_f0. So we need some more work to get stable estimates.]

 



8 Comments

  1. Bob Ladd said,

    July 25, 2015 @ 11:55 am

    Thanks, Mark – this really does make it look like there might be something to the idea that young American women do this more than many other people. I first noticed this at least 15-20 years ago, before it was a hot topic, in talking to American junior year abroad students here in Edinburgh. Much as I appreciate your skepticism about the hot topic that this has become, I remain convinced that there is something behind the stereotype – so I also appreciate your attempt to investigate it in a serious and reproducible way.

  2. Guy said,

    July 25, 2015 @ 1:04 pm

    So were the two samples of data at the beginning of the recording unrepresentative? Or does the bimodality of the introducer's plot reflect an earlier onset of creak before the end of the phrase? Or is there some other explanation for the disagreement between the rough preliminary of counting instances of creak and this more detailed measure?

    [(myl) I think there are several factors — the simplest one is just the relative duration of period-doubling regions. But another reason is that once Pinker gets underway, he uses a large number of final-rising or final-mid-level contours, e.g.

    but I hope to persuade you that it is a persistent historical development/
    visible on scales from millennia to years/
    from wars and genocides —
    to the treatment of children and animals\
    I'm going to walk you through six major historical declines of violence/
    try to explain their immediate causes, that is/
    particular events of the era —
    and tie them together in terms of their ultimate causes —
    namely general historical forces —
    interacting with human nature\

    These 77 words, occupying more than 28 seconds and divided into 9 or 10 phrases, have only two phrase-final falls to the bottom of his range — both of which exhibit period-doubling in their final syllable ("animals" and "nature".]

  3. Guy said,

    July 25, 2015 @ 3:57 pm

    If the rising intonations are a significant cause of the difference, that would be somewhat ironic, as it would then appear that women must navigate between the Scylla and Charybdis of uptalk and creak.

  4. Language Log » More Pinker peace creak - PeaceWords.Us said,

    July 25, 2015 @ 4:28 pm

    […] Language Log » More Pinker peace creak […]

  5. Martin Ball said,

    July 26, 2015 @ 1:43 am

    It's made the Guardian, with a feminist critique of glottal fry: http://www.theguardian.com/commentisfree/2015/jul/24/vocal-fry-strong-female-voice

  6. Breffni said,

    July 26, 2015 @ 4:20 am

    Mark, this looks a very persuasive approach to measuring the prevalence of creak over stretches of speech. But my strong impression is that the effect isn't just a cumulative one: each individual instance of creak sounds distinctive to me. Otherwise it remains to be explained why I (we?) found the creak striking on the basis of just the first few instances, like in the 16-second clip you posted on the 24th, or why Pinker's creak isn't as noticeable over a stretch of similar length. Of the three tokens in the clip, I would have picked out at least "author" and "psychology" as marked even in isolation.

    Or is does the distributional measure somehow tell us indirectly about the quality of individual tokens? I don't think it's early onset or duration, which have been mentioned above, because there are plenty of (to me) striking-sounding creaky monosyllables (like in "mind" and "declined" a bit later on).

    On the question of "should we bother", I don't see why not. If young women, or any group of any kind, were using a new word or inflectional morpheme, of course linguists would look at it. It just happens that in this case it isn't universally self-evident that there is any innovation, and if so, what exactly it is.

    The question of why people complain about it is interesting in itself, but if it turns out the whole thing is a delusion, then it becomes a question with a very different complexion.

  7. Bloix said,

    July 27, 2015 @ 5:38 pm

    People are telling you, Mark, that voices that sound like Kim Kardashian, Britney Spears, Paris Hilton, etc. drive. them. fucking. crazy. We think (and I am one of them) that it is because of the vocal fry and uptalk. You tell us that they don't use those styles any more frequently than George W. Bush or Hilary Clinton, and therefore we are making shit up because we are sexist assholes.

    Maybe it is because, not being scientists with fancy analyzing machines, we are not explaining what we hear very well. But we do hear it.

    I was on the metro the other day, sitting behind two young women dressed for the office who were having an ordinary conversation about their jobs. One woman used some fry at the end of sentences or phrases when the volume of air flow reduced and pitch dropped, and a bit of uptalk-and-a-pause to elicit a brief expression of agreement or understanding. Her voice was perfectly unexceptionable.

    The other young woman – the only reason I noticed the conversation in the first place – fryed out e-e-v-er-y f-f-u-uck-k-ii-ing w-o-or-d and uptalked? in the middle? and the end? and everywhere? in every sentence? She was fingernails on a chalkboard to me.

  8. Tim Martin said,

    July 30, 2015 @ 8:59 am

    I've been reading a lot of the recent (and older) LL posts on vocal creak/fry, trying to understand the physics of it. Question: is it possible for a person to produce these low-range pitches without a vocal creak? Or is vocal creak entirely the result of producing an f0 that is near the flicker fusion threshold? (And would that mean that someone with a really high voice wouldn't be able to creak?)

RSS feed for comments on this post