Language Log

Political voices

October 12, 2011 @ 2:45 pm · Filed by Mark Liberman under Computational linguistics, Prosody

Like other regular readers of Andrew Sullivan's web log, I was not surprised that he was happy about Sarah Palin's decision not to run for U.S. president in 2012. However, one aspect of his commentary ("Rejoice!", 10/5/2011) did surprise me. The puzzle is in the second sentence:

Our Three Year National Nightmare Is Over!

Palin talks to Mark Levin here (her voice is the deeper one).

Mark Levin is a radio talk show host, and Sullivan's link goes to a page on Levin's web site that includes not only the text of Palin's statement, but also accesses an mp3 file of a 15-minute segment of his show. My interest here, of course, is not in the politics but in the phonetics. Is it really true that Sarah Palin's voice is deeper (i.e. lower in pitch) than Mark Levin's?

Not in this interview, certainly. I've pitch tracked the initial segment, where Levin reads the text of Palin's statement, a 2:30-long segment that starts this way:

And also Sarah Palin's answer to one of Levin's questions, a 1:15-long segment that starts like this:

Here's the result, displaying the quantiles of fundamental frequency as in some previous posts:

And in tabular form:

Percentile	10%	20%	30%	40%	50%	60%	70%	80%	90%
Palin	171	181	189	197	207	215	225	240	265
Levin	110	118	125	131	138	145	153	162	174
Palin/Levin Ratio	1.57	1.53	1.51	1.50	1.50	1.49	1.47	1.48	1.52

In general, Sarah Palin's voice in this interview is about 50% higher than Mark Levin's — corresponding to a musical interval of a fifth.

So what was Andrew Sullivan thinking? I don't think that he was reacting to a perception of Sarah Palin's voice as unusually low for an American woman. He has certainly never accused her of an excessively masculine persona; and as a matter of fact, her pitch range in this interview is somewhat higher than the values for a sample of American women in conversational settings that I reported a few years ago ("Nationality, gender and pitch", 11/12/2007):

(The data is not entirely comparable, since she is doing an important interview from a remote location, and so her level of physiological arousal and vocal projection is probably greater than it would be in an informal telephone conversation; but let's say at least that her pitch range is by no means unexpectedly "deep".)

On the other hand, Mr. Levin's voice is somewhat higher than that of most professional male speakers, including male politicians. Here's a comparison to which I've added the F0 quantiles from Barack Obama's weekly address of 8/6/2011 and Ronald Reagan's weekly address of 5/8/1982:

So that little parenthetical dig "her voice is the deeper one" is probably aimed at Levin rather than at Palin. But what still puzzles me is why Andrew Sullivan chose to add that parenthetical comment in the first place. Is he trying to tell us that Mark Levin has a girlie voice, or at least an insufficiently masculine one? That seems inconsistent with Sullivan's general take on sexual politics.

October 12, 2011 @ 2:45 pm · Filed by Mark Liberman under Computational linguistics, Prosody

Permalink

18 Comments

vanderleun said,

October 12, 2011 @ 3:11 pm

"That seems inconsistent with Sullivan's general take on sexual politics."

Not if you know his earlier "Milky Loads" web page.
jetRink said,

October 12, 2011 @ 4:35 pm

A higher voice doesn't necessarily sound feminine. For example, Gilbert Gottfried, who affects a high, grating voice in order to make himself and his characters sound comically deranged. Levin's ranting radio persona does frequently echo Gottfried's various histrionic birds. Maybe Sullivan was just saying Mark Levin sounds crazy.
Eric P Smith said,

October 12, 2011 @ 5:03 pm

Mark Liberman correctly points out that Mark Levin’s F0 is about a perfect fifth (7 semitones) lower than Sarah Palin’s. However, a typical man’s F0 is about an octave (12 semitones) lower than a typical woman’s. That is borne out by the comparison of Sarah Palin’s F0 (on the one hand) and the F0s of Ronald Reagan and Barack Obama (on the other hand) in the third of the three graphs that Mark Liberman shows above. If a woman sings a note and a man is asked to “sing the same note” then (unless he is musically trained) he will typically sing an octave below the woman. Thus, all things told, we expect a man’s voice to be an octave below a woman’s, and we subconsciously allow for that expectation when we compare voices.

I surmise that Andrew Sullivan perceives Sarah Palin’s voice as being deeper than Mark Levin’s because of that subconscious allowance. But, like Mark Liberman, I am quite in the dark as to why he makes a point of it.

[(myl) Do you have a reference for the "about an octave" idea? Because the data I've seen — as in this plot from Kent 1994, or the measurements that I cite here — suggests that in fact, 5 to 8 semitones is more like the average difference for American adult men and women.]
Janice Byer said,

October 12, 2011 @ 6:49 pm

My guess is Sully couldn't resist writing what his better angels (he's a devout Catholic) wouldn't ordinarily allow. He and Levin both claim the mantle of political conservative but are hugely divided on LGBT rights, including Sully's right to have married his husband. In my experience when working in fundraising to fight AiDs, such digs are a common form of playful teasing among gay men, who're sensitive, however, to straight guys not being so amused.
Andy Averill said,

October 12, 2011 @ 9:23 pm

Both by the sound of him and by your chart, I wouldn't even describe Mark Levin as a tenor. I suspect he's a baritone, although it's not always possibly to guess somebody's vocal range from their speaking voice.
Dean Eckles said,

October 12, 2011 @ 11:43 pm

I've recently noticed these percentile plots here on Language Log. Why do you choose these over ECDF plots? My sense is that the latter are much more common and what I generally use in my work. Is there a reason to prefer the opposite when looking at this kind of data? Thanks.

[(myl) There are two reasons that I've taken to using percentiles for some plots and tables on LLOG. The most important reason is that statistically-less-sophisticated readers seem to understand them better. I find that a surprisingly large number of people have a surprisingly large amount of trouble grasping talk about distributions, especially measures of spread like variance or standard deviation, and plots of empirical density functions, cumulative or otherwise. Percentiles seem to be easier for more people to assimilate, and they do the job well enough for comparison of ranges. In the particular case of F0 data, there's also a problem with estimation errors — generally a few percent of the estimates are doubled or halved (or multiplied or divided by some other small integer). This can create funny-looking tails in the empirical distribution, unless you do something to prune those values, which is hard to do well. But with a good pitch tracker and half-way decent audio, these tracking errors don't affect any of the quantiles between 0.10 and 0.90.]
SC said,

October 13, 2011 @ 12:52 am

Perhaps spectral slope? He sounds like he's got a lot of high-amplitude upper partials to me.

[(myl) He sounds perpetually aggrieved to me; and that tone also matches his content. An "angry tone" is likely to result in more (relative) high-frequency energy in the glottal source spectrum, as well as the somewhat higher f0 described here for a different speaker. I've never heard Mr. Levin speak in any other way, however, so perhaps this is just his natural voice quality, and no inference of emotional loading should be drawn.]
David Donnell said,

October 13, 2011 @ 1:06 am

Anyone who took Sullivan's comment as anything more than just a dig at Levin gets a Literal Guy Award!

[(myl) Does that come with a plaque?

At first, I took the remark to be a dig at Palin. But since her pitch range is on the high side of the distribution for American females, that didn't make any sense. So I concluded that it was a dig at Levin, who does (in my impression) have a somewhat high-pitched voice for a male radio personality. In the end, I decided to turn it into an excuse to do a few minutes of easy research.]
other one spoon said,

October 13, 2011 @ 1:26 am

"Deeper" as in "more philosophically profound"?
David Y said,

October 13, 2011 @ 2:41 am

What pitch tracking software do you recommend, Prof. Liberman? Is there any freeware or shareware available that can generate similar graphs? Thanks.

[(myl) I use get_f0 from the ESPS software package, available under a free software license as e.g. http://ldc.upenn.edu/myl/esps60.6.linmac.src.tgz

This is unix command-line code written 20 years ago or so by David Talkin (now at Google) based on the algorithm devised by George Doddington and described in Doddington & Secrest 1983. See here for a description. It's still the best generally-available program, in my opinion, though it would be good to be able to retrain its background-noise expectations without major surgery.

Give the the ESPS tools are installed on a unix platform (the package that I pointed you to should compile and install on reasonably current linux and OSX machines), you can say things like

get_f0 x.wav x.f0; fea_print MyLayout x.f0 x.af0

and end up with a text file with one line per analysis frame (say 100 per second), which you can read into R or some other statistical analysis program to make the plots. Specifically, I did something like

X <- read.table("x.af0") Xp <- X[,1]; Xp <- X[X>0]
Q <- (1:9)/10 Xpq <- quantile(Xp,probs=Q) plot(Q,Xpq,type="b") If you prefer an interactive program, essentially the same pitch-tracking algorithm is available in Wavesurfer, also free software.

Many people use Praat, also free, which offers many integrated capabilities for speech analysis and synthesis, and is scriptable. In my experience, its pitch-tracker is not so reliable, but David Weenink will tell you that I'm completely wrong on this point, so you should try it for yourself.]
maidhc said,

October 13, 2011 @ 4:50 am

When Sarah Palin went into politics, she adopted a voice that was different from her natural voice (which can be heard on the videos from when she was competing in beauty contests).

Her home is in the Mat-Su valley. In the 1930s the government of Alaska encouraged people from Minnesota to emigrate to Alaska, on the assumption that they were already familiar with cold-weather agriculture.

Thus there are quite a few old-timers around the Mat-Su valley with Minnesota accents. It has a lot of similarities to Canadian accents from a little bit further north.

This was the accent that Sarah Palin adopted, but it's not that common among younger people in that area.
Theodore said,

October 13, 2011 @ 7:35 am

What SC said and maidhc's first paragraph both occurred to me; Palin's voice in the clip is different from the usual public voice we hear from her in speeches; it seems more natural and does seem lower. I don't know Levin's voice to compare to his usual. Someone who knows both voices might be perceiving contrary motion of the two voices from their usual.

There's also a very different manipulation of spectral distribution, with Levin's voice recorded on a studio microphone and equalized for everything from his taste and that of his producers to broadcast needs. Palin's voice is on the phone, distorted and equalized for telephony, i.e. intelligibility and ease of data compression, etc. This can make it harder to judge by ear.

[(myl) One of the (evolutionarily and technically) useful things about the pitch of the voice (in the sense of fundamental frequency) is that it's unaffected by transmission channel characteristics. In particular, pitch estimates will not be changed in any material way by any of the kinds of equalization, dynamic range compression, filtering, encoding, etc., that are applied to audio in any studio or any telephone channel.]
Acilius said,

October 13, 2011 @ 9:00 am

Like maidhc and Theodore, I found the sample from Palin to be noticeably different from the voice I'm used to. I think the difference is that she usually talks faster than that. So when I heard her linger over the final syllables of "supporters" and "members" she sounded like someone else for a second.
Mr Punch said,

October 13, 2011 @ 11:04 am

"Is he trying to tell us that Mark Levin has a girlie voice, or at least an insufficiently masculine one? That seems inconsistent with Sullivan's general take on sexual politics." My take is "yes," and "not necessarily." It's a gibe along the lines of "You're not half the man Sarah Palin is, Mister Macho."
D.O. said,

October 13, 2011 @ 11:40 am

@Dean Eckles: What exactly is the problem? Just rest your head on your shoulder and enjoy!
Rosina Lippi said,

October 13, 2011 @ 1:05 pm

The answer may have nothing to do with anything you can measure. A number of studies have established that a negative listener can hallucinate a foreign accent when listening to a native English speaker (this happens a lot with Asian instructors at the university level). If people hallucinate accents to suit their preconceptions about a given speaker, then it's reasonable to think that pitch is also susceptible.
Eric P Smith said,

October 13, 2011 @ 5:29 pm

Thank you, Mark, for the interest you have taken in my comment.

First let me say that I am not a linguist. I have a keen amateur interest in linguistics, and some University education in it, though not to degree level.

I can't defend my claim that the average difference between the pitch of the speaking voices of men and women in the western world is today as high as “about an octave” (12 semitones). It’s what I had believed from childhood (I am 62), and it may have been true at one time, but having done some Internet research today I realise it is not true now. The difference is commonly described on the Internet as “about an octave” or “almost an octave”, but that seems to be a bit of an urban myth unless “almost” is interpreted very widely. An octave has long been the received wisdom for singing voices, but I realise that that does not necessarily transfer to speaking voices.

On the other hand, I wonder if your figure of “5 to 8 semitones” may be a bit on the low side. I no longer have access to scholarly papers, but there seems to be a consensus among those who write on the Internet and seem to know what they are talking about (now that’s a vague criterion if ever there was one!) that the difference for speaking voices is about 9 or 10 semitones:

1. http://shkrobius.livejournal.com/186205.html says, “The fundamental frequency of a male voice is, on average, 120-125Hz vs 210-225Hz for a female voice”. I make that a difference of 9.9 semitones.

[(myl) You could get a range of different numbers depending on the population you sample and the context in which you record them — note that the median values differ by about 9.3 semitones for the Japanese speakers in the sample discussed here — but I haven't seen any reasonably systematic data from American speakers where the difference is quite that large.

One easy collection to check is the TIMIT corpus, recorded in the late 1980s, which includes 10 read sentences from each of 630 speakers, of which 438 are male and 192 are female. I just did it, and found that in TIMIT, the overall average pitch for the male speakers is 121.4 Hz, while the overall average for the female speakers is 201.3 Hz. That's a difference of about 8.75 semitones.

I'm not sure where the author of that LiveJournal post got his or her numbers, but in fact they're not way out of line with the rest of these values. So maybe I should amend what I wrote to read "5 to 10 semitones".]
Eric P Smith said,

October 13, 2011 @ 5:30 pm

2. http://www.screamingbee.com/support/ScriptVOX/ScriptVOXStudioDocumentation.pdf says, “ScriptVOX Studio incorporates a voice changing engine that can apply advanced digital processing to text-to-speech. … A woman's voice can be changed to man's voice by sliding the pitch to -0.8.” That is, by lowering the pitch by 0.8 octaves, or 9.6 semitones.

3. I’ve analysed the plot from Kent 1994 that you linked to. I calculate the average pitch difference that it shows between male and female voices from the ages of 30 to 70 as 9.1 semitones.

4. Finally you yourself say (http://itre.cis.upenn.edu/~myl/languagelog/archives/004974.html), “Though the pitch of anyone's speech depends very much on circumstances, under comparable conditions, (adult) human females voices are likely to show pitches roughly 75% higher [than] those of male voices.” That is 9.7 semitones.

RSS feed for comments on this post

Political voices

18 Comments

vanderleun said,

jetRink said,

Eric P Smith said,

Janice Byer said,

Andy Averill said,

Dean Eckles said,

SC said,

David Donnell said,

other one spoon said,

David Y said,

maidhc said,

Theodore said,

Acilius said,

Mr Punch said,

D.O. said,

Rosina Lippi said,

Eric P Smith said,

Eric P Smith said,

Follow us on Twitter

Archives [+/–]

Blogroll [+/–]

Meta