Paris Hilton's vocal registers

« previous post | next post »

Hilary Hanson, "Paris Hilton's Split-Second Voice Change Leaves People Absolutely Stunned", Huffpost 6/29/2024:

Paris Hilton floored social media users this week by seamlessly shifting her vocal register midsentence as she spoke before Congress. […]

When Rep. Claudia Tenney (R-N.Y.) asked Hilton for her thoughts on incorporating mental health care into new legislation, Hilton responded first by complimenting the lawmaker’s outfit.

“I love your jacket. The sparkles are amazing,” Hilton said.

Tenney joked, “I had a little bling here for today,” to which Hilton replied, “Yes, I wanted to find out who made it later.”

Hilton delivered her fashion comments in a relatively high voice with lots of vocal fry. However, as she continued speaking and began to discuss mental health care, her voice shifted to a noticeably deeper register.

“But I think the most important thing is, they need access to therapy counseling, mentorship and other community-based programs,” she said, with her voice dropping on the word “but.”

A video of the testimony can be found on CSPAN (or CSPAN's X account).

There's a long history of interest in Paris Hilton's vocal registers, as a quick YouTube scan demonstrates.

And in the cited congressional testimony, there's no question that her way of talking changes (somewhat) between the three phrases where she compliments Rep. Tenney's jacket, and the following phrases where she reads her message on mental health care.

But the description of the intonational aspects of the change is actually upside down. Here are the first nine phrases — three offering fashion comments (with Rep. Tenney's response after the first two), and then the first six of her phrases about mental health:

  1. Thank you, I enjoyed our Zoom call,
  2. and I love your jacket, the sparkles are amazing.
    Um… [I had a little bling here for today you know] Yeah,
  3. I wanted to find out who made it later. Um…
  4. …but I think the most important thing is
  5. we need access to therapy, counseling, mentorship,
  6. and other community based programs.
  7. And I think it's also important to not label these kids as troubled or bad
  8. I think it- it makes these children feel like they aren't believed, and
  9. that's something that's important for them to not feel that way, and
  10. yeah I think it's just about showing kindness and love and compassion and support

If we look at the pitch tracks for the phrases labelled (3) and (4) above, we can see that Ms. Hilton actually uses somewhat higher pitches in the phrase starting with "but". And for those who can interpret perceived intonation in terms of pitch height, listening confirms it:

Another way to quantify her intonational range is to look at the quantiles of estimated F0, phrase by phrase:

Her mental-health phrases (5) to (9) are indeed a bit lower in pitch (except at their endings), compared to her fashion phrases (1)-(3). But her first mental-health phrase — "(4) …but I think the most important thing is" — is strikingly higher in overall pitch, and has an expanded pitch range as well, as quantified by the per-phrase MAD (median absolute deviation from the median):

(1) 7.8   (2) 3.7  (3) 5.4  (4)  16.1  (5) 4.2   (6) 9.4  (7) 7.9   (8) 5.7   (9) 6.2

This change, probably expressing the topic shift, is exactly the opposite of the  description in the Huffpost article:

Hilton delivered her fashion comments in a relatively high voice with lots of vocal fry. However, as she continued speaking and began to discuss mental health care, her voice shifted to a noticeably deeper register.

“But I think the most important thing is, they need access to therapy counseling, mentorship and other community-based programs,” she said, with her voice dropping on the word “but.”

The article's claim about vocal fry is also empirically unfounded, as far as I can tell.

But let's not beat up on the Huffpost writer — people in general are very good at detecting shifts in speech style, but surprisingly bad at characterizing the acoustic correlates of their perceptions. And linguists are not in a much better position — it's easy to demonstrate that some particular ideas about speech sounds are wrong, but it often remains largely a mystery what the real articulatory and acoustic underpinnings of our perceptions are.

 



20 Comments

  1. Philip Taylor said,

    June 30, 2024 @ 1:42 pm

    Would you not agree though, Mark, that the "But" itself is more-or-less at the same pitch as the preceding utterances, and it is only with the word(s) following the "but" that the pitch rises ?

  2. Mark Liberman said,

    June 30, 2024 @ 2:00 pm

    @Philip Taylor: "Would you not agree though, Mark, that the "But" itself is more-or-less at the same pitch as the preceding utterances"

    No. It's false that "her voice [dropped] on the word 'but'" — the mean F0 of the previous syllable is 179 Hz, while the mean F0 of "but" is 186 Hz.

    And the article asserts that the drop on "but" signals that "as she continued speaking and began to discuss mental health care, her voice shifted to a noticeably deeper register", which is false — "as she … began to discuss mental health care", the phrasal mean F0 goes from 180 Hz. to 201 Hz.

  3. Philip Taylor said,

    June 30, 2024 @ 2:05 pm

    OK, but what is the mean F0 of all of her words preceding the "but" ? I was not comparing it (aurally / mentally) with the preceding syllable but with [the perceived mean of the pitch of ] all of her preceding words. And I certainly wasn't seeking to support the hypothesis that "as she continued speaking […] her voice shifted to a noticeably deeper register".

  4. Mark Liberman said,

    June 30, 2024 @ 2:48 pm

    @Philip Taylor:

    Still wrong. As I wrote in the previous comment, the mean F0 of whole previous phrase is 180 Hz, while the mean F0 of the word "but" is 186.

    "And I certainly wasn't seeking to support the hypothesis that 'as she continued speaking […] her voice shifted to a noticeably deeper register'."

    So what's the point, then?

  5. Y said,

    June 30, 2024 @ 3:24 pm

    The question is then, what gives the perception misdescribed as lowered pitch? I too am hearing something, but I haven't figured out what it is.

    I took some stabs at it, but I haven't convinced myself of anything. First, I don't think average anything is a useful measure: a few particular landmarks can create an overall impression. Second, maybe some formants are lower? F1 of "it" ("who made it") is a little higher than in "it makes" (600 vs. 570). F1 and F2 of "sparkles" are a little higher than in "aren't believed". As I said, not convincing.

  6. Philip Taylor said,

    June 30, 2024 @ 3:46 pm

    There was no "point", Mark, I was asking a simple question. In your original prose you had written « "Ms. Hilton actually uses somewhat higher pitches in the phrase starting with "but" », and I queried whether the higher pitch started with the word "but" or whether it started with the words immediately following.

  7. Philip Taylor said,

    June 30, 2024 @ 4:00 pm

    And to note that 180Hz is F#3 minus 47 cents, while 186Hz is F#3 plus 9 cents — not a great deal different to the average human ear, I would respectfully suggest, although a musician would detect the difference immediately.

  8. CCH said,

    June 30, 2024 @ 7:30 pm

    I wonder if they didn't actually hear it get higher, rather they heard less vocal fry and as vocal fry is considered "feminine" or "ditsy" to a layperon, it is perceived as a higher pitch as that is typically where "feminine" voices lie. So they didn't hear it higher and then go lower, they heard less vocal fry but associated the vocal fry to a higher register than it actually is because of the associations it has in pop culture. I feel like the assumption from that point of view would be that Paris Hilton is putting on a "higher pitched", more "feminine" voice (with all the implications that come with vocal fry – shallowness, vanity, ignorance, etc.), but her real voice is deeper and lacks vocal fry (and thus is to be taken more seriously because deeper and no vocal fry means "intelligent"!). Obviously, that's basically the exact opposite of what happened, but I do think it's possible that their assumptions and the connotations of vocal fry/no vocal fry influenced their interpretations and overrode any further research into it.

  9. John Swindle said,

    June 30, 2024 @ 7:46 pm

    The overall impression I had was that she was trying at first to sound feminine but then shifted into her normal speech. As with others' observations about pitch, I could well have that upside down. I wasn't familiar with her and didn't understand the shoulder language.

  10. Mark Liberman said,

    June 30, 2024 @ 8:02 pm

    @CCH: "I wonder if they didn't actually hear it get higher, rather they heard less vocal fry and as vocal fry is considered "feminine" or "ditsy" to a layperon, it is perceived as a higher pitch as that is typically where "feminine" voices lie. So they didn't hear it higher and then go lower, they heard less vocal fry but associated the vocal fry to a higher register than it actually is because of the associations it has in pop culture."

    You're right that gender stereotypes are doubtless at the root of it — but it can't have been "vocal fry" that triggered it either, because there's very little in those first three phrases, and at least as much in the following mental-health material.

  11. Jonathan Smith said,

    June 30, 2024 @ 11:20 pm

    Rhythm / accent; sorry for being so specific :D

    young(ish) women in the U.S. can (usually?) switch into / out of "lilting girl (also gay man?) banter" at will. My daughter never did it except for fun, but now reports using it self-consciously in engaging with / seeking cooperation from a female public in a professional context — night and day in terms of results, I hear. She has a funny story about this register leaking out of the job context into interactions with female staff at a restaurant… same improved response. Sociological in-group thing I suppose…

    incidentally "thanks!!" (actually rather restrained in the clip here) is IMO the single most dramatically transformed lexical item; it has practically /i/ or something at least as the first element of a diphthong. Ask an acquaintance for a demo — it's a whole meme.

  12. Jarek Weckwerth said,

    July 1, 2024 @ 8:55 am

    I think Jonathan Smith is on to something. Of course I shouldn't risk making such claims without numerical support in a comment on a Breakfast Experiment on LL, but impressionistically I hear more phrase-final lengthening in the first part, and a higher density of rising termini as a function of shorter phrases. Also, the first phrase after but makes me think of a trend which I have never verbalized before: The first phrase of a longer public utterance tends to be higher in pitch than the rest.

    In terms of thanks: The raising of the TRAP vowel before nasals is de rigueur in American English these days but this word does indeed seem to be at the bleeding edge. Frequency? Pre-fortis clipping making the raising more salient when compared with e.g. man where there isn't any?

  13. Benjamin E. Orsatti said,

    July 1, 2024 @ 10:44 am

    Jarek Weckwerth said,

    The first phrase of a longer public utterance tends to be higher in pitch than the rest.

    Wonder if that's a rhetorical "attention-getting" device (I think there may even be a word for that in Greek). In other words, imagine, /Friends, Romans, Countrymen/ orated at a lower pitch than /Lend me your ears/. That would be, to use a linguistic term, "weird".

    Or, /Four score and seven years ago/ "jumping up" to /our fathers brought forth on this continent/. It would sound shrill to the point of mental imbalance, wunnit? If I were a phonetician, I'd be running off to gather recordings of FDR, Churchill, and Kennedy to put against those of Hitler, Kim Jong Il, Benito Mussolini, where "shrillness" and "frenzy" might actually be desirable in a demagogue.

  14. David Morris said,

    July 1, 2024 @ 3:30 pm

    Which is her real voice? Or is that the wrong question? Does anyone have one single 'real voice' or do we all adopt different voices like we all adopt registers?

  15. Graeme said,

    July 2, 2024 @ 2:36 am

    Does anyone have a single 'real voice'? I know I vary between eg 'answering the phone', 'lecturing', 'reading' and 'chit chat'.

    I appear before (Australian) legislative committees quite a bit, as well as on radio (about law stuff, not linguistics). When I hear recordings of that, I'm always amused how the shift from banter to formal point making manifests in a shift from multi-cadence to more monotone. And even how a shift from stop-start to staged-flow affects the perception of that. What is obvious to a listener is not obvious to the speakers for whom the different voices are second nature.

  16. GH said,

    July 2, 2024 @ 6:55 am

    I think it's fairly obvious that her voice changes markedly as she shifts the topic to mental care (though to my ears it then starts to gradually creep back towards the way she speaks initially). What that change actually consists of in acoustic terms would be interesting to get a good description of.

    Informally, I would say that her voice becomes more "full-bodied" and rich, which I think is what HuffPo characterizes as a "deeper register." My subjective impression is that it's a more natural sound, with her habitual style sounding pinched and strained, but that might just be bias.

  17. Philip Resnik said,

    July 2, 2024 @ 7:24 am

    As someone with vocal training, for me the most noticeable aspect of the shift involves not pitch per se but a transition to a much more “open” sound, corresponding to a literally more open air pathway. This is correlated, I think, with a perception of something like “cuteness” (think Betty Boop); not sure if vocal aspects of “sounding cute” have been studied. In any case the more open sound contributes to resonance and perhaps that is the sense of “deep” here rather than pitch.

    The other thing to note is that our perceptions are not purely bottom up. We constantly engage in predictive processing, comparing top down predictions to bottom up evidence and making corrections to our model based on error signal (cf Andy Clark 2015, “Whatever Next”). Those error signals correspond to cognitive effort (eg as measured by EEG N400 or reading times) and I believe it can also have an effect of reinterpretation. It would not surprise me if despite the F0 people literally hear/perceive a pitch shift whether it was actually there or not.

  18. James said,

    July 2, 2024 @ 8:15 am

    I think Philip Resnik is absolutely right. It does sound to me as if her voice is lower after "but" but I can see that would because of a change in the mix of frequencies present, rather than the fundamental pitch.

    You can see her expression and the way she articulates change at that point. It would be interesting to test this with someone watching the video with no audio to see if they can identify where her speech changes. Just to be sure that perception isn't biased by hearing it.

    She also has often has her head lower, and nearer the mic, for the "serious" part as she looks down at her notes. This could also affect the perceived sound

  19. JPL said,

    July 2, 2024 @ 3:50 pm

    My impression on hearing the excerpt, using the button in the OP, not the video, and before reading the whole post, was that the phrases starting with (4) were definitely higher in pitch, "strikingly" so, as Mark described it, and also had greater pitch range. The difference struck me as similar to a classroom situation, where I speak to a student in the first row at the beginning of class, about some friendly everyday topic, and then address the whole class about the business at hand. There's a topic shift, but there's also a shift in addressee and conversational purpose, and it seems a shift in intonational features goes along with that.

  20. Benjamin E. Orsatti said,

    July 3, 2024 @ 7:23 am

    JPL,

    Could "loudness" also have something to do with it? It seems that if you're having a personal aside with someone in close proximity, your pitch range might be all over the place, but teachers / lawyers / actors also have their "classroom / courtroom / stage voices" that have to hit the back of the room, and so wouldn't they necessarily have a more limited pitch range?

RSS feed for comments on this post