Vocal fry probably doesn't harm your career prospects

« previous post | next post »

. . . but not being yourself just might.

There's been a lot of media interest recently in a new study of "vocal fry", sparked in part by an unusually detailed magazine article — Olga Khazan, "Vocal Fry May Hurt Women's Job Prospects", The Atlantic 5/29/2014. Other coverage: Gail Sullivan, "Study: Women with creaky voices — also known as ‘vocal fry’ — deemed less hireable", Washington Post 6/2/2014; "Is vocal fry hurting women's job prospects?", NPR Marketplace 6/5/2014; Maya Rhodan, "3 Speech Habits That Are Worse Than Vocal Fry in Job Interviews", Time Magazine 6/4/2014; and so on.

The original study is  Rindy C. Anderson et al., "Vocal Fry May Undermine the Success of Young Women in the Labor Market", PLOSOne 5/28/2014. Below is a guest post by Christian DiCanio, offering a more skeptical take.


In a recent article in PLOS One, authors Anderson et al. find that vocal fry is harmful to people's career prospects, with women being slightly more at risk than men. At face value, it may seem surprising that such subtle cues can be shown to have a hidden influence on our attitudes towards people; and anything to do with the influence of one's voice on employment prospects is notable in a market where competition for jobs remains fierce.

As a result, this article has gotten considerable attention in the media, with the conclusion that women need to police how they speak for fear of being perceived as untrustworthy by a potential employer. But on closer inspection, it turns out that self-policing might not be needed at all, at least with respect to this feature. The original study contained quite serious flaws in its design which, when considered carefully, prevent us from drawing any conclusions about which specific acoustic characteristics sounded "untrustworthy" to the listeners who participated.

The design of the study was relatively straightforward. A group of 800 people, via an online system (Qualtrics), listened to speakers produce the sentence "Thank you for considering me for this opportunity." Some of these sentences were produced with vocal fry, which, in contrast to normal voice, involves temporal irregularity in the vibration of the vocal cords (folds) and lower overall pitch (see the figure below). To a listener, the "vocal fry" regions sound something like a stick being dragged along a fence, where one can hear individual vibrations or pulses of the vocal folds.

The listeners in this experiment were asked to listen to each speaker's pair of utterances — normal and with imitated vocal fry — and to indicate which of the pair "was perceived to be more educated, competent, trustworthy, attractive, and which speaker they would hire". The expectation was that listeners might have different attitudes towards those sentences with vocal fry than they would towards sentences without vocal fry — and this is what they found.

In passing, we should note that imitated vocal fry gave a relatively bad impression of both male and female speakers, with the female speakers coming out a little bit worse:

The big issue here is just where the authors got the voices with vocal fry.

When linguists, phoneticians, or speech scientists want to study whether an acoustic characteristic in someone's voice influences how listeners perceive them, they often will record a person and then modify those aspects of the person's voice which they wish to test. This process, called analysis/resynthesis, allows one to carefully control the acoustic dimensions in the signal and requires some knowledge of speech acoustics and digital signal processing. Certain aspects of one's voice are harder to modify than others. As it happens, vocal fry is one of these hard-to-modify characteristics. (I'll leave the more detailed question of why it is hard to resynthesize vocal fry, and voice quality more generally, out of the discussion for now.)

Fortunately, there is a solution. Just as one might buy two types of apples to compare their flavors, we can look for speakers who just happen to produce more vocal fry in their speech and compare them to those who do not produce it. If one were to play the speech of these two groups to listeners (and potential employers), listeners might have different attitudes about one of the groups. This is, in fact, what Yuasa (2010) did in her study of creaky phonation. Yet, importantly, the authors of the study here did no such thing. Rather, they recorded speakers producing normal utterances and then trained them to produce an utterance with greater vocal fry. As a consequence, the speech contained in all of the vocal fry stimuli is actually speech where speakers are attempting to imitate a voice with vocal fry.

There are several reasons why this is problematic, but the first is perhaps the most obvious: most people are not particularly accurate at imitating someone else's speech. If you ask the average person to "talk like a Texan", they might (or might could) try to imitate something that they believe to be an important characteristic of Texas speech. Yet, to most listeners, especially those from Texas, they would sound like a caricature of an actual Texan. The same thing would happen with people imitating an upper-class British accent, or Arnold Schwarzenegger, or Sarah Palin.

As it turns out, this is the rub. While the speakers in the study here insert creak at various places in their speech, its real use in natural speech is controlled in a way that they don't imitate accurately. Previous studies which look at vocal fry, particularly Redi and Shattuck-Hufnagel (2001), find that it is rather restricted. It tends to occur in locations in phrases and utterances where we might expect low pitch. Vocal fry is disconnected from these locations of low pitch in the imitated speech here. Rather, the speakers seem to produce a very flat, robotic voice when imitating vocal fry. The typical intonation for the stimulus sentence is something like "THANK you for conSIdering me FOR this OPorTUnity", where the syllables in caps reflect higher pitch levels than the surrounding ones.

This is not the only way in which the imitated speech sounds unnatural, however. With one exception (speaker 5), each of the imitated sentences produced by female speakers is also longer than the corresponding non-imitated sentence for that speaker, as shown in the table below:

Duration of vocal fry sentence
(in seconds)
Duration of plain sentence
(in seconds)
Ratio
Speaker 1 2.91 2.25 1.29
Speaker 2 2.90 2.84 1.02
Speaker 3 2.69 2.19 1.23
Speaker 4 2.33 2.07 1.13
Speaker 5 2.15 2.37 0.91
Speaker 6 2.57 2.43 1.06
Speaker 7 3.24 2.57 1.26

These differences do not appear to be restricted to particular words either. As seen in the figure below, almost all words were longer in the imitated speech than in the natural speech. The longer duration here, in comparison with the shorter natural sentences, may have the quality of sounding stilted to the listener.

A related problem in the study is the authors' acoustic analysis of the speech signal. The calculation of pitch in the speech signal requires determining how well successive vocal fold vibrations correlate with one another. When the vocal folds are vibrating normally, such a correlation is possible, but when vocal fold vibration is too irregular, as in vocal fry, it is impossible to calculate pitch accurately. However, an acoustic analysis program may still try to calculate possible (erroneous) values. Anderson et al. argue that the pitch in the vocal fry sentences is universally lower than that in the natural sentences, but they neither controlled nor mentioned how pitch was calculated during durations of vocal fry. In fact, the pitch on the expression "Thank you", which contained no vocal fry in any of the utterances, had universally lower pitch in the vocal fry sentences than in the normal sentences. This suggests that the speakers may simply be lowering pitch across the entire imitated sentence, rather than simply adding vocal fry. Finally, no quantitative acoustic estimation of actual vocal fry (such as jitter, shimmer, cepstral peak prominence, etc.) was ever included in the authors' study. Yes, you heard that right – in a study relating vocal fry to listener attitudes and hireability there was no quantitative estimate of whether and how the stimuli differed with respect to the test variable.

Taken together, these observations suggest that the speakers in the study simply attempted to lower their overall pitch level while imitating vocal fry rather than including more vocal fry in a natural fashion. The increased effort involved in the imitation also made their utterances longer. These two acoustic differences, among others, would seem to contribute to the speakers sounding unnatural when imitating vocal fry. So, when listeners judge the female speakers with vocal fry as sounding "untrustworthy", there is a good possibility that they are simply making such a judgment based on the speaker not sounding like herself. The better lesson that one might take home instead here is that one's job prospects are harmed if you try to talk (or act) like someone who you are not.

References:

Anderson, R. C., Klofstad, C. A., Mayew, W. J., and Venkatachalam, M. (2014) Vocal fry may undermine the success of young women in the labor market. PLOS ONE 9(5): 1-8.

Redi, L. and Shattuck-Hufnagel, S. (2001) Variation in the realization of glottalization in normal speakers. Journal of Phonetics 29:407-429.

Yuasa, I. P. (2010) Creaky voice: a new feminine voice quality for young urban-oriented upwardly mobile American women? American Speech 85(3):315–337.


The above is a guest post by Christian DiCanio.



13 Comments

  1. Chris said,

    June 7, 2014 @ 9:39 am

    Christian,

    Excellent critique. I tweeted it at the Atlantic's Olga Khazan, who wrote the original piece. Hopefully she'll print a retraction of some sort. I think journalists who regurgitate bad science ought to be held accountable.

    I would add a few minor points:

    1. Subjects only listened to all male voices or all female voices. This means, at best, people slightly prefer [normal male voices over fake-vocal fry male voices] and [normal female voices over fake-vocal fry female voices]. It says nothing about preference for normal male voices over real-vocal fry female voices, as the study's title suggested.

    2. Even with better stimuli like you outlined, A/B forced choice testing method seems wrong. How would this fair if there were a Likert scale from Strongly prefer A to Strongly prefer B? Likely, differences would all muddle in the middle. The strength of their findings may be mirage created by their methodology.

    3. Expecting random people to have valid opinions about hire-ability is off the mark. Until you've actually had to make decisions about hiring people, your intuitions about that process are not a particularly accurate reflection of actual hiring managers.

    [(myl) I would add that if you ask listeners to choose between two versions of a short sentence produced by a single speaker, you draw attention to the feature that differentiates the versions. I suspect that you could get similar results by having listeners do side-by-side comparison of variants created by asking speakers to speak fast, or to speak slowly; or to use a higher than normal pitch, or a lower than normal pitch; or to read the phrase with and without a filled pause. This is not necessarily because (say) an occasional um or uh strongly will keep you out of the job market -- if it did, few of us would have jobs.]

  2. Daniel Ezra Johnson said,

    June 7, 2014 @ 11:51 am

    That a bad study claims to find an effect is not very strong evidence against the existence of that effect.

  3. David L said,

    June 7, 2014 @ 2:11 pm

    I have no expert knowledge here, and the critique makes some good points. But if you were to do an experiment where different speakers — fried and unfried — said the same things, would that not be open to parallel criticisms? Perhaps people who speak with fry tend to use different stress patterns or intonations in some way that affects the totality of how they are heard.

    The difficulty, in other words, is isolating fry from all the other variables that go into speech. It seems like a hard thing to accomplish, experimentally, and I wonder if it's even possible to conduct a study that focuses on fry to the exclusion of all else.

  4. Noah Motion said,

    June 7, 2014 @ 3:21 pm

    Chris: Two-alternative forced choice procedures don't necessarily produce a tendency toward one or another response. It's not uncommon for people to give either response equally often, which is the two-choice analog to muddled-middle Likert scale data.

    Daniel Ezra Johnson: No one made that claim. Christian pointed out flaws in the study and gave a plausible non-fry-related reason for the results.

  5. Joe said,

    June 7, 2014 @ 3:24 pm

    Given the assumptions and conclusions of the study, wouldn't it have made more sense to have prompts in which speakers were trained not to produce vocal fry in contexts where they normally would and then compare those results? (Although I would still randomize all samples and ask for Likert scale rankings on truthfulness, etc)?

  6. Noah Motion said,

    June 7, 2014 @ 3:32 pm

    I mis-used the phrase "two-alternative forced choice" above, at least as it's commonly used in psychophysics. I should have said that two-choice procedures don't necessarily produce a tendency toward one or another response.

  7. Christian DiCanio said,

    June 7, 2014 @ 9:41 pm

    David L,

    It is perfectly possible to record speech, control for as many variables as you can, and test only specific ones. This is at the heart of any work in speech perception (and perceptual cues) and is part of what many phoneticians/speech scientists do. When one can not control specific cues so easily, it is more customary to find specific speakers who use one cue more than others do and then test listeners on these two types of voices. The idea is that some of the variability between voices will wash out by having listeners listen to many different types of voices.

    And in case you're thinking about it, those cues which might be correlated to the cue in question can usually be controlled. So, this is all possible.

  8. Mark Sicoli said,

    June 8, 2014 @ 8:38 am

    Thanks for writing about this Christian. Their findings seem more relevant to the literature on vocal deception. In the least they could have also tested creaky speakers trained to speak less creaky to try and pull the deception effects apart from the voice quality effects. And if they got results in favor of their hypothesis they could be used as before and after illustrations in advertisements for professional voice therapy ((wink))

    One thing is that the title of your post is more appropriately “Vocal Fry may not harm your career prospects” because, as you clearly pointed out that they don’t provide evidence that vocal fry probably harms your career prospects, there’s also no evidence that it probably doesn’t.

    Unfortunately, perception studies rating favorability or unfavorability of speakers this way tend to flatten the social fields and then overgeneralize from there. I think Anderson et al’s hypothesis would do better tested in a phonetically grounded linguistic ethnography of hiring practices. A few things we’d like to know: Just what are the jobs at issue and what are the voice qualities of people already in these positions. Since there are claims that creaky voice is becoming more common in (corporately and academically) successful american women, this needs to be resolved with their claim to being “less hirable”. What do real hiring committees actually talk about when making real hiring or not-hiring decisions? What does the sequence of talk in the job interviews look like? What about dialect shifting when playing professional rather than playing some other social role? And given more time, are their changes in voice quality features over the career of individuals as they move up the corporate ladder or not?

  9. Elbie said,

    June 8, 2014 @ 8:53 am

    @Christian, it also bears mentioning the effect of the 'washing out' of incidental characteristics can be quantified using statistical techniques (as you already know), which cuts against the authors' claim that "variation in myriad acoustic features of the different speakers' voices other than vocal fry would be difficult to account for as confounding effects."

    At the very least, the strategy of "us[ing] naturally occurring instances of vocal fry and normal voice from different speakers," disavowed here by the authors, would have possibly produced some minor confounds, rather than allowing the 'affectation' confound to map perfectly into their variable of interest — which is what makes your criticism so cutting.

    I'd recommend readers of this post listen to the auditory stimuli, which are all available at PLOS ONE — the oddity of the fry examples is striking when they're listened to alone. And when listened to alongside their 'normal' counterparts, it's no surprise that participants preferred the non-affected speech.

  10. M. Lesho said,

    June 9, 2014 @ 11:29 am

    I have to agree with some of the other comments that there is no way to say if vocal fry hurts women's job prospects or not, based on this poorly designed study. In fact, I would not be at all surprised if it does, depending on the job. The tone that the media always takes in covering these types of studies is telling – discrimination against how women speak is very real.

    Besides the phonetic problems that Christian pointed out so well, I think it's also important to consider a few other things.

    First, how in the world did a phonetic/sociolinguistic study by biologists, business professors, and political scientists get past PLOS One's peer review process? The stimuli are so ridiculous that it's hard to believe that any linguist was asked to review it.

    Second, the authors concluded that young women should change how they speak in order to get hired. Why is it that the recommendation is not for people to stop judging how others speak? Again, the fact that the authors were not linguists seems relevant here – it seems like they might have set out to prove that young women's speech is annoying.

    Finally, it's worth pointing out why this awful study has received so much media coverage, while better linguistic studies may go unnoticed. It's sexism, plain and simple. People love any chance to mock young women's speech. The coverage is also very poorly done – for example, the Atlantic piece on this study points to Zooey Deschanel as an example of a vocal fryer, using a video of her that features no vocal fry at all. It seems to me that Zooey just represents the stereotype of a woman who does vocal fry (young, white, middle/upper class, seen by many as silly or immature) – it doesn't actually matter if she does it or not.

  11. David Marjanović said,

    June 10, 2014 @ 2:27 pm

    Second, the authors concluded that young women should change how they speak in order to get hired. Why is it that the recommendation is not for people to stop judging how others speak?

    Because the Free Market® is seen as a force of nature in contemporary Western culture, rather than as being composed of people who have views.

  12. Jason Stokes said,

    June 11, 2014 @ 2:04 am

    Second, the authors concluded that young women should change how they speak in order to get hired. Why is it that the recommendation is not for people to stop judging how others speak?

    Perhaps the authors have learned to be rather realistic about human nature and the potential for any immediate change in it, thereof.

  13. ThomasH said,

    June 26, 2014 @ 1:17 pm

    I confess, this is off topic. I picked this thread only because it has "your" in the title.

    Around Washington DC I think I'm hearing "your" more often. Sometimes it seems to work like an article "your traffic this morning is snarled" and sometimes it's just weird, "your showers may be heavy at times. I hardly drive so I take no ownership of the traffic and do not feel particularly possessive about the showers that may fall.

    What's going on. Is this a regional fluke or the START OF SOMETHING BIG. :)

RSS feed for comments on this post