Language Log

Political pitch ranges

April 22, 2015 @ 4:54 am · Filed by Mark Liberman under Language and politics, Prosody

I don't have time for much this morning, but here's a plot of the f0 quantiles of the first minute or so of each of six speeches from the 2015 NRA-ILA Leadership Forum:

["F0", pronounced "eff zero", is a conventional designation for the fundamental frequency of the voice, which represents the rate of oscillation of the vocal folds in voiced speech, and is a physical proxy for the psychological dimension of "pitch". "Hz" is the standard abbreviation for "Hertz", the international unit of frequency (cycles per second) named after Heinrich Hertz.]

Here's the same thing on a semitone (i.e. log) scale:

[Converting frequency F in Hz to semitones relative to a basis frequency B is 12*log2(F/B) .]

Do these large differences reflect differences in intrinsic vocal range, or in (projected) physiological state? In other words, does Jeb Bush have a lower voice than Ted Cruz, or was he just (presenting himself as) less excited?

I don't think that we can tell from this sample. Here's a comparison of Ted Cruz addressing the NRA vs. being interviewed by Sean Hannity:

And again on a semitone scale:

The numbers for the NRA speeches (in Hz):

          10    20    30    40    50    60    70    80    90
Trump  129.9 143.7 151.7 159.1 168.4 180.3 193.6 207.4 233.9
Rubio  121.8 130.9 137.3 144.1 150.7 158.3 169.5 184.4 205.0
Walker 127.5 135.5 141.8 146.6 152.6 157.6 164.4 172.0 185.9
Cruz   125.9 141.4 155.8 165.0 174.9 187.7 203.9 218.9 233.7
Perry  120.0 134.2 148.9 158.9 168.4 176.5 184.9 195.1 207.8
Bush   110.1 119.2 127.7 134.8 140.1 144.6 151.2 158.9 170.9

And for the speech/interview comparison:

                    10    20    30    40    50    60    70    80    90
NRA_Speech       125.9 141.4 155.8 165.0 174.9 187.7 203.9 218.9 233.7
HannityInterview 101.0 107.8 113.1 117.8 122.6 128.5 135.6 145.9 162.3

In the Hannity Interview, Senator Cruz's pitch range is on the high side for adult male speakers in a conversational setting — but the studio situation is such that he was probably projecting his voice more strongly than he would in conversation.

This is by way of background to a future attempt to explain the many layers of phonetic and psycholinguistic confusion in Forrest Wickman and Anne Marie Lindemann, "Furious 7 Is the Super Bowl of Deep Voices. But Whose Voice Is Deepest?", Slate 4/21/2015.

Update — Elle Reeve suggested that I add data from Rand Paul's speech at CPAC 2015 (he wasn't at the NRA event):

And Senator Paul doesn't have a higher voice in general — in an interview setting, his pitch range is very similar to Ted Cruz's:

Apparently in pitch range as in other respects, Rand Paul is sui generis. Unless for some reason the atmosphere of CPAC creates strikingly shriller voices…

April 22, 2015 @ 4:54 am · Filed by Mark Liberman under Language and politics, Prosody

Permalink

6 Comments

David L said,

April 22, 2015 @ 8:03 am

Could you explain a little more how you generate these curves? I'm guessing that a sample of speech is broken down into discrete units with some short duration, and that for each snippet you determine F0. So each curve is a cumulative frequency plot of the F0. Is that right?

[(myl) I use a pitch tracker, which outputs values like these:
```
0.000000 0 198.572723 0.227984
0.000000 0 187.645493 0.232534
0.000000 0 252.433167 0.189427
0.000000 0 542.779907 0.463097
224.285797 1 1099.927246 0.700276
220.628967 1 1747.341309 0.911507
223.504013 1 2237.301514 0.950568
225.028244 1 2472.060059 0.983549
225.896774 1 2492.384277 0.971443
```
where the first column is an f0 estimate (in voiced speech) or 0, the second column is a binary variable giving the program's judgment about whether the speech is voiced or not, the third column is RMS amplitude, and the fourth column is the serial cross-correlation value at the hypothesized pitch lag. I've set the program up to produce one row of estimates every 5 milliseconds, or 200 times a second.
I then pull out the first-column value for putatively voiced frames — which for 60-70 seconds of speech is about 9000-10000 numbers — and (ask R to) calculate the percentiles, which I then (ask R to plot).]
FM said,

April 22, 2015 @ 9:33 am

Could there be an accent effect? Is Walker's Wisconsin speech, all other things equal, more monotone than other US dialects? I certainly perceive the Midwestern accent as monotone, though I could of course be wrong, since most claims of this sort seem to be.

[(myl) I genuinely don't know. Considering how widespread stereotypes of this sort are, and how easy it would be to test them, it's surprising that no one has done so.]
Carmen said,

April 22, 2015 @ 2:37 pm

Can you please do Hillary, Obama and some other Dems? Got totally excited about the presentations of the candidates recently and would like to maybe do a paper…

[(myl) Hi Carmen! Do you have some suggestions for particular sources?]
Clay Beckner said,

April 22, 2015 @ 8:49 pm

I'm fascinated by the difference between Cruz's NRA appearance and the Hannity interview. I'm thinking of the research on F0 accommodation in Larry King interviews (Gregory & Webster 1996). Perhaps Hannity has a lower F0 than Cruz, and Cruz is shifting toward Hannity during the interview?

[(myl) While f0 accommodation is genuinely a thing, I don't think that it has much to do what's going on here. This is just an instance of the expected difference in pitch between projecting to speak to a large crowd — even with amplification — and the much lower vocal effort (including subglottal pressure, amplitude, etc.) involved in a conversational setting. For a longer discussion of these issues, see "Martin Luther King's rhetorical phonetics", 1/15/2007, which includes this plot:
.

(See also "Biology, sex, culture, and pitch", 8/16/2013.)

Or record yourself talking as if to someone across the street, and then talking as if to someone sitting next to you in a quiet room, and you'll see a similarly large difference in f0.]
Scott McClure said,

April 23, 2015 @ 7:52 am

"Do these large differences reflect differences in intrinsic vocal range, or in (projected) physiological state? In other words, does Jeb Bush have a lower voice than Ted Cruz, or was he just (presenting himself as) less excited?"

My hunch is that Jeb Bush does have a lower voice than Ted Cruz. But listening to the NRA speech vs. the Hannity interview also makes me think that Ted Cruz 'turns on' a special public speaking voice in front of a large crowd to a greater extent than Jeb Bush. That might be consistent with Ted Cruz's history as a college debater and a litigator:

http://www.nytimes.com/2015/04/23/us/politics/ted-cruz-honed-political-skills-in-princeton-debate-club.html

and it connects with what MYL has posted on Dr. King:

http://itre.cis.upenn.edu/~myl/languagelog/archives/004045.html

but I certainly can't exclude F0 accommodation to Sean Hannity.
D.O. said,

April 23, 2015 @ 4:32 pm

It would certainly be instructive to look at loudness-pitch correlation, but that will assume that the sound operator and recorder didn't change settings for the relevant time intervals…

[(myl) What's easy to do is to look at the consequences of various ways of increasing vocal effort. One reliable way is to have people talk to other people at varying distances. Another way is to use the Lombard reflex, which depends on background noise levels and therefore can easily be induced by playing noise (white or colored) through headphones. These two interventions apparently produce slightly different results, but both will produce greater vocal effort, which generally entails higher amplitude, a shift in spectral balance towards higher frequencies, and higher pitch.]

RSS feed for comments on this post

Political pitch ranges

6 Comments

David L said,

FM said,

Carmen said,

Clay Beckner said,

Scott McClure said,

D.O. said,

Follow us on Twitter

Archives [+/–]

Blogroll [+/–]

Meta