Gender polarization or accommodation in conversational pitch

« previous post | next post »

It's been a while since my last Breakfast Experiment™, but a conversation yesterday spurred me to run a simple data-analysis script with interesting results, presented below. The script and the results are simple, but the issues are complicated — consider yourself warned.

The background:

Everyone knows that voice pitch is a significant secondary sexual characteristic in humans, and also that this hormonally-determined effect is socially and culturally modulated.  For a survey of some of the issues, see "Biology, sex, culture, and pitch", 8/16/2013. It's also well established that the degree of gender polarization in pitch, established by hormonal changes at puberty, declines on average as people get older — see "Age, sex, and F0", 3/25/2017.  What causes this age-related decline in pitch polarization? Is it because older people are less interested in advertising their gender, or because there are age-related changes in the larynx and the brain? Or are both psychological and physiological changes involved?

Of course, many other things affect the pitch of your voice — who you are, how you feel, what you're talking about, who you're talking with, how you're phrasing your message, the style of the discourse, how much background noise there is, and on and on. And it's clear that people can adjust their pitch to align with social and cultural expectations, including gender differences, as discussed in Cartei et al., "Children can control the expression of masculinity and femininity through the voice", Royal Society Open Science 2019. There's also a large literature on "accommodation" or "alignment" effects in pitch, i.e. the tendency to shift your voice pitch towards the pitch of the voices you're hearing — e.g. Gijssels et al., "Speech accommodation without priming: The case of pitch", Discourse Processes 2016.

All this suggests two different and opposite sexual-identity effects on pitch in conversation. We might see polarization, so that sex differences are exaggerated in female-male conversations compared to female-female or male-male conversations; or we might see accommodation, so that the differences are smaller in cross-sex versus same-sex conversations. Or we might sometimes see one, and sometimes the other, depending on the speakers' relationship, the goals of the conversation, their actual gender identity, and so on. So it's complicated, as usual.

What I did:

The Switchboard Corpus, collected at Texas Instruments in Dallas in 1990-91, involves 2438 two-sided telephone conversations of about 5 minutes each among 543 speakers. 302 speakers are identified as male, 241 as female. (Some speakers participate more than once.)

There are 2438*2 = 4876 call sides:

Female talking with female: 1312 sides
Female talking with male: 1081 sides
Male talking with male: 1402 sides
Male talking with female: 1081 sides

I ran a pitch-tracker on all the audio, at 200 frames/second, and concatenated all the F0 estimates for voiced regions in all of the call sides in each of the four categories. This yielded between 36 million and 51 million F0 estimates in each categories.

I then calculated overall F0 quantiles for each category.

The results show clearly that interlocutor gender mattered; that the effects (in this dataset) are in the direction of accommodation rather than polarization; and that the effects can be quite large:


This all ignores many issues. Gender is not binary, even if chromosomal sex mostly is. These were telephone conversations between strangers on assigned topics — things might be very different in other sorts of conversations. I've ignored age, though that information is available for the speakers in this dataset. And "fundamental frequency" (F0) is a good topic for "spherical cow" jokes, because the concept assumes falsely that a single well-defined "fundamental frequency" exists at each point in a time-varying quasi-periodic sound such as human speech. The fact is that human vocal-fold oscillation involves multiple modes, with period-doubling (= pitch halving) fading in and out (among other non-spherical complexities), and pitch-tracking programs doing various complicated things to pretend that this isn't true. As a result, quantile estimates can be problematic, much less means and variances.

(And I'm assuming that my script is free of relevant bugs…)

Still, this result suggests that F0 accommodation in non-scripted cross-sex conversations exists, and can be quite large. A next step would be to introduce age into the picture — I'll try this at some point with the Fisher dataset, which has an order of magnitude more speakers and recordings than the Switchboard dataset.

1 Comment

  1. mg said,

    July 20, 2022 @ 10:26 am

    You are making an assumption on which is the "natural" pitch. It could be that the accommodation is happening in same-gender conversations (maybe signalling belonging, for example).

    [(myl) Good point — though that would be a form of gender signaling rather than a form of accommodation, I think.]

RSS feed for comments on this post