Language Log

Raising his voice

October 8, 2011 @ 10:46 am · Filed by Mark Liberman under Computational linguistics, Language and politics, Prosody

FDR had his weekly "Fireside chats", and in 1982 Ronald Reagan began the modern tradition of weekly presidential addresses, which U.S. presidents since then have maintained. I don't think that very many people actually listen to these things — no one that I've asked has ever admitted to regular consumption. But I've been collecting them since 2004, and listening to most of them, and a few days ago I noticed something.

What I noticed is that president Obama seemed a mite testy in his weekly address for 10/1/2011 ("Fighting for the American Jobs Act"). This led me to ponder the phonetics of testiness, and of emotional expression in general. For this morning's Breakfast Experiment™ I thought I'd take up one small aspect of one dimension of this large topic, namely what happens to F0 ("fundamental frequency", commonly called "pitch") when you "raise your voice".

Now, different people have different characteristic pitch ranges; and for any given person, there are lots of reasons that pitch range might go up or down. You raise your voice when the ambient noise level is higher — you can't help it, this is known as the Lombard Reflex, and it's why people who are listening to iPods while they talk tend to be perceived as shouting by people who inhabit a quieter space. You raise your voice — or at least you should — when you're talking to people who are farther away from you; if you don't, they won't be able to hear you. "Raising your voice" in this sense involves more subglottal pressure, more vocal effort, higher amplitude, and also higher pitch.

But there are also effects of what we might call "arousal" — when you're animated and engaged, you also tend to raise your voice. And you can be animated and engaged because you're happy and excited, or because you're annoyed or angry.

I decided to compare president Obama's weekly address for 10/01/2011, where he seemed to me to be animated in a somewhat negative way, to another recent address, the one from 8/6/2011 ("Getting the Economy Growing Faster"), where he seemed much calmer. Both of these addresses are about his relationship with Congress. And both were recorded in similar settings, with the president in what seems to be essentially the same relationship to the camera, so there should be no difference in background noise level to cause the Lombard Reflex to kick in, and no difference in audience distance or size to cause a change in vocal projection.

A clip from the August 6 address is first, and one from the October 1 address follows:


	This week, Congress reached an agreement that’s going to allow us to make some progress in reducing our nation’s budget deficit. And through this compromise, both parties are going to have to work together on a larger plan to get our nation’s finances in order. That’s important. We’ve got to make sure that Washington lives within its means, just like families do. In the long term, the health of our economy depends on it.


	Hello, everyone. It’s been almost three weeks since I sent the American Jobs Act to Congress – three weeks since I sent them a bill that would put people back to work and put money in people’s pockets. This jobs bill is fully paid for. This jobs bill contains the kinds of proposals that Democrats and Republicans have supported in the past. And now I want it back. It is time for Congress to get its act together and pass this jobs bill so I can sign it into law.

I pitch-tracked both speeches (well, all of the hundreds of speeches I've collected for the past few presidents, but that's another story), and plotted a comparison of the percentiles:

Here's the same data plotted in semitones relative to A 55:

The difference is a fairly large one: an average pitch of 137.3 ±0.38 Hz on October 1, vs. an average of 106.6 ±0.31 Hz on August 6, for a proportional difference of about 29%.

To avoid misunderstanding, let me repeat that there are lots of reasons for someone's pitch range to vary, so that this is not a reliable metric for physiological arousal. But when other things are held constant, it can be interpreted that way. And it's easier to quantify and compare than things like voice quality are.

[Note: the confidence intervals on the average pitch estimates are just the usual ±1.96*s.e. For all the usual reasons, and some unusual ones as well, these are not very meaningful or even trustworthy bounds.]

Update — Arguably, both of these addresses are atypical, in opposite directions. This is suggested by a comparison of the F0 percentiles in these two addresses with the overall F0 percentiles for the 125 weekly addresses that I've collected so far for President Obama. In Hz:

And in semitones:

October 8, 2011 @ 10:46 am · Filed by Mark Liberman under Computational linguistics, Language and politics, Prosody

Permalink

12 Comments

Janice Byer said,

October 8, 2011 @ 1:07 pm

The difference is profound. Reportedly, public speakers are advised by coaches to aim for their natural low in pitch. It's said to sound "warmer" to audiences. (That many of us instinctively raise our pitch when greeting a small child or a domestic pet is not to sound cold or testy, obviously, but rather to sound like a smaller being to put a smaller being as its ease.)

[(myl) There are two opposite "natural" effects here: larger animals generally produce lower-pitched vocalizations (because the primary sound-producing structures in the larynx are larger and more massive, and also because resonant cavities are larger; on the other hand, more highly aroused animals generally produce higher-pitched vocalizations, because of higher subglottal pressure, greater muscle tension in the larynx, etc.]
D.O. said,

October 8, 2011 @ 1:42 pm

Prof. Liberman, for us lazy and uneducated, can you give the numbers or post the link explaining how great is the effect of ambient noise/distance. For example, how much the pitch goes up if noise increases by 10dB etc.

[(myl) From here, this:

]
Jimbino said,

October 8, 2011 @ 3:05 pm

In reaction to Janice Byer's post, I must say my experience is that children mostly perceive adults' baby-talk and pitch change when addressing them as patronizing, and they soon warm to, and learn to prefer, adults who do not perform such contortions. I'd like to see definitive experimental results on this. See http://www.psychologytoday.com/blog/child-myths/200909/more-talking-about-baby-talk
stevelaudig said,

October 8, 2011 @ 8:24 pm

this is as good a place as any to describe this phenomenon. I am a native english speaker, Midwest, retired lawyer, nearly 60, lived in the U.S. my entire life until 2008 when I took a second career as an English language teacher in China. Prior to getting here I had no memorable contact with spoken putonghua/mandarin.

In the first weeks in Changsha while riding the bus and being surrounded [cheek-to-cheek] with the locals on crowded buses. I would find myself suddenly feeling anxious, quite anxious at times and worried and for no reason I could identify. Crowds of people talking on U.S. buses didn't have the effect.

Upon reflection it seemed to me that my brain was having something in the nature of a fight/flight reaction that it had people would argue, really argue, in English and was telling my body get ready for trouble.

I concluded that the, what I would call staccato/jumpy/tonal nature of mandarin was triggering these brain/body responses [I had traveld in Italy a bit on trains and buses and Italians never made me "jumpy"].

I think my brain/body thought these people-who were having normal/regular, though loud [we were on the bus afterall], conversations about daily life, were arguing! Once I realized my physical reaction I tested my theory with a student assistant by occasionally asking her what particular people on the bus were talking about. It was shopping, what to have for dinner, a daughter's new baby. Interesting effect I thought I'd pass along. Don't know what it means, if anything. Now I don't have the physical response even when there actually is an argument, say a fender-bender. Now it seems like the folks aren't really angry but that they are feigning anger because they feel the situation calls for it.
maidhc said,

October 9, 2011 @ 12:30 am

I've noticed this when wearing noise-cancelling headphones in a plane. For me, the engine noise is partially cancelled out, but for other people talking it isn't. It's strange to hear them talking in a way that's much different than the way they would talk if they were hearing the same level of noise as I am. But it doesn't sound to me as though they are annoyed.
Andy Averill said,

October 9, 2011 @ 7:43 am

In Obama's speeches, I've noticed that whenever he wants to sound more visionary, such as in the peroration of his 2008 stump speech, his voice takes on a quality I associate with Martin Luther King. He tends to raise the pitch, and then keep it at the same level for longer than people normally do, only dropping at the very end of a sentence. Here's an example from 2008:

http://www.youtube.com/watch?v=uvsqv3unCEw

I wonder if this pattern is typical of speeches made by all presidents on exalted occasions, such as inaugural speeches.
BF said,

October 9, 2011 @ 9:00 am

Apropos of stevelaudig's comment re his "limbic" interpretation of Mandarin speech elements as communicating argumentative content, I noticed something similar when I (Caucasian) lived in a predominantly African American area of a city for some years. When I first got there, I was struck by how angry people sounded, even when speaking about neutral (or sometimes even positive) topics. I surmised early on that I was misinterpreting prosodic (and other) elements of BEV, and, in fact, after a few years, the disconcerting contrast between tone and content diminshed considerably. Not being familiar with the scholarly literature on this phenomenon, I'm wondering if someone could identify some of the speech components that contribute to it.
Jarek Weckwerth said,

October 9, 2011 @ 11:09 am

@stevelaudig (and @maidhc, in a sense): I had a similar experience recently on my first trip to China. On the plane, the two rows in front of and behind mine were occupied by what seemed like one party of Chinese people (most probably from Hong Kong, thus maybe speaking Cantonese?). The departure was delayed by 30 mins or so, and they spent the whole time having a rather animated conversation essentially over my head. There was far less background noise than on a flying plane, but enough for the Lombard effect to kick in. After just a couple of minutes, I got very tired of listening to this, and, frankly, annoyed. (And, then, quite embarrassed by my annoyance.)

I think the Lombard effect combined with what you call the "staccato" nature of the language to produce my negative reaction. And I would say the "staccato-ness", in turn, could be described as resulting from the vastly higher incidence of major pitch movements when compared to e.g. English or my native Slavic language; or maybe from the evident impossibility of interpreting them in the "usual" way, i.e. as conveying coarse-grained grammatical, attitudinal and discourse functions…

(This was my first experience of being essentially immersed in a tonal language, as opposed to hearing small snippets from e.g. passers-by. The impression went away pretty quickly, too, so maybe e.g. being anxious about a long and uncomfortable flight on my part was at least part of the explanation.)
Jessica said,

October 9, 2011 @ 1:02 pm

I've seen a number of friends experience the same reaction when first living in China. In addition to the reasons mentioned, I had always assumed that part of it was the frequent use of the Mandarin 4th tone, which to the ear of a native English speaker sounds like an "angry tone," or at least an emphatic one. I don't know enough about the Cantonese tones to know if there would be a similar effect. (I have also wondered how much the common stereotype in the US of Chinese as "quiet, scholarly types" contributes to the shock when encountering animated, loud, every-day Chinese conversation!)
Ray Dillinger said,

October 10, 2011 @ 12:39 pm

English is a non-tonal language and English-speakers use tone to communicate non-linguistic information – general attitude, aggression, emotion, etc. It's not surprising that English-speakers (and speakers of other non-tonal languages) interpret tone in this way, even when it's encountered in tonal languages.

Has anyone asked the inverse of the question, though? What do speakers of tonal languages use to communicate attitude, aggression, emotion, etc, and do they ever encounter uses of it that put them on edge among other cultures?
John Cowan said,

October 10, 2011 @ 10:26 pm

Ray Dillinger: They use intonation, which is layered on top of the tone melody. In a sentence with rising intonation, a 1st tone (high level) tone will be higher at the end of the sentence than at the beginning; by the same token, a 3rd tone (which dips quite low) will be too.

In "Essentialist Explanations", I find the line "Cantonese is essentially what everyone else in China calls swearing" attributed to one Kiri Aradia Morgan. I have also seen somewhere (I don't remember where) that speaking Cantonese is frequently perceived as shouting.
Jon Haslam said,

October 12, 2011 @ 4:55 am

If nothing else, surely you've identified a great starting point for the plot of the next Jack Reacher novel. "The president looked nervous. Damn nervous. Reacher looked over the pitch-track of his latest address with a brow that furroughed as deeply as the lines on the graph."

RSS feed for comments on this post

Raising his voice

12 Comments

Janice Byer said,

D.O. said,

Jimbino said,

stevelaudig said,

maidhc said,

Andy Averill said,

BF said,

Jarek Weckwerth said,

Jessica said,

Ray Dillinger said,

John Cowan said,

Jon Haslam said,

Follow us on Twitter

Archives [+/–]

Blogroll [+/–]

Meta