. . . but not being yourself just might.
There's been a lot of media interest recently in a new study of "vocal fry", sparked in part by an unusually detailed magazine article — Olga Khazan, "Vocal Fry May Hurt Women's Job Prospects", The Atlantic 5/29/2014. Other coverage: Gail Sullivan, "Study: Women with creaky voices — also known as ‘vocal fry’ — deemed less hireable", Washington Post 6/2/2014; "Is vocal fry hurting women's job prospects?", NPR Marketplace 6/5/2014; Maya Rhodan, "3 Speech Habits That Are Worse Than Vocal Fry in Job Interviews", Time Magazine 6/4/2014; and so on.
The original study is Rindy C. Anderson et al., "Vocal Fry May Undermine the Success of Young Women in the Labor Market", PLOSOne 5/28/2014. Below is a guest post by Christian DiCanio, offering a more skeptical take.
In a recent article in PLOS One, authors Anderson et al. find that vocal fry is harmful to people's career prospects, with women being slightly more at risk than men. At face value, it may seem surprising that such subtle cues can be shown to have a hidden influence on our attitudes towards people; and anything to do with the influence of one's voice on employment prospects is notable in a market where competition for jobs remains fierce.
As a result, this article has gotten considerable attention in the media, with the conclusion that women need to police how they speak for fear of being perceived as untrustworthy by a potential employer. But on closer inspection, it turns out that self-policing might not be needed at all, at least with respect to this feature. The original study contained quite serious flaws in its design which, when considered carefully, prevent us from drawing any conclusions about which specific acoustic characteristics sounded "untrustworthy" to the listeners who participated.
The design of the study was relatively straightforward. A group of 800 people, via an online system (Qualtrics), listened to speakers produce the sentence "Thank you for considering me for this opportunity." Some of these sentences were produced with vocal fry, which, in contrast to normal voice, involves temporal irregularity in the vibration of the vocal cords (folds) and lower overall pitch (see the figure below). To a listener, the "vocal fry" regions sound something like a stick being dragged along a fence, where one can hear individual vibrations or pulses of the vocal folds.
The listeners in this experiment were asked to listen to each speaker's pair of utterances — normal and with imitated vocal fry — and to indicate which of the pair "was perceived to be more educated, competent, trustworthy, attractive, and which speaker they would hire". The expectation was that listeners might have different attitudes towards those sentences with vocal fry than they would towards sentences without vocal fry — and this is what they found.
In passing, we should note that imitated vocal fry gave a relatively bad impression of both male and female speakers, with the female speakers coming out a little bit worse:
The big issue here is just where the authors got the voices with vocal fry.
When linguists, phoneticians, or speech scientists want to study whether an acoustic characteristic in someone's voice influences how listeners perceive them, they often will record a person and then modify those aspects of the person's voice which they wish to test. This process, called analysis/resynthesis, allows one to carefully control the acoustic dimensions in the signal and requires some knowledge of speech acoustics and digital signal processing. Certain aspects of one's voice are harder to modify than others. As it happens, vocal fry is one of these hard-to-modify characteristics. (I'll leave the more detailed question of why it is hard to resynthesize vocal fry, and voice quality more generally, out of the discussion for now.)
Fortunately, there is a solution. Just as one might buy two types of apples to compare their flavors, we can look for speakers who just happen to produce more vocal fry in their speech and compare them to those who do not produce it. If one were to play the speech of these two groups to listeners (and potential employers), listeners might have different attitudes about one of the groups. This is, in fact, what Yuasa (2010) did in her study of creaky phonation. Yet, importantly, the authors of the study here did no such thing. Rather, they recorded speakers producing normal utterances and then trained them to produce an utterance with greater vocal fry. As a consequence, the speech contained in all of the vocal fry stimuli is actually speech where speakers are attempting to imitate a voice with vocal fry.
There are several reasons why this is problematic, but the first is perhaps the most obvious: most people are not particularly accurate at imitating someone else's speech. If you ask the average person to "talk like a Texan", they might (or might could) try to imitate something that they believe to be an important characteristic of Texas speech. Yet, to most listeners, especially those from Texas, they would sound like a caricature of an actual Texan. The same thing would happen with people imitating an upper-class British accent, or Arnold Schwarzenegger, or Sarah Palin.
As it turns out, this is the rub. While the speakers in the study here insert creak at various places in their speech, its real use in natural speech is controlled in a way that they don't imitate accurately. Previous studies which look at vocal fry, particularly Redi and Shattuck-Hufnagel (2001), find that it is rather restricted. It tends to occur in locations in phrases and utterances where we might expect low pitch. Vocal fry is disconnected from these locations of low pitch in the imitated speech here. Rather, the speakers seem to produce a very flat, robotic voice when imitating vocal fry. The typical intonation for the stimulus sentence is something like "THANK you for conSIdering me FOR this OPorTUnity", where the syllables in caps reflect higher pitch levels than the surrounding ones.
This is not the only way in which the imitated speech sounds unnatural, however. With one exception (speaker 5), each of the imitated sentences produced by female speakers is also longer than the corresponding non-imitated sentence for that speaker, as shown in the table below:
|Duration of vocal fry sentence
|Duration of plain sentence
These differences do not appear to be restricted to particular words either. As seen in the figure below, almost all words were longer in the imitated speech than in the natural speech. The longer duration here, in comparison with the shorter natural sentences, may have the quality of sounding stilted to the listener.
A related problem in the study is the authors' acoustic analysis of the speech signal. The calculation of pitch in the speech signal requires determining how well successive vocal fold vibrations correlate with one another. When the vocal folds are vibrating normally, such a correlation is possible, but when vocal fold vibration is too irregular, as in vocal fry, it is impossible to calculate pitch accurately. However, an acoustic analysis program may still try to calculate possible (erroneous) values. Anderson et al. argue that the pitch in the vocal fry sentences is universally lower than that in the natural sentences, but they neither controlled nor mentioned how pitch was calculated during durations of vocal fry. In fact, the pitch on the expression "Thank you", which contained no vocal fry in any of the utterances, had universally lower pitch in the vocal fry sentences than in the normal sentences. This suggests that the speakers may simply be lowering pitch across the entire imitated sentence, rather than simply adding vocal fry. Finally, no quantitative acoustic estimation of actual vocal fry (such as jitter, shimmer, cepstral peak prominence, etc.) was ever included in the authors' study. Yes, you heard that right – in a study relating vocal fry to listener attitudes and hireability there was no quantitative estimate of whether and how the stimuli differed with respect to the test variable.
Taken together, these observations suggest that the speakers in the study simply attempted to lower their overall pitch level while imitating vocal fry rather than including more vocal fry in a natural fashion. The increased effort involved in the imitation also made their utterances longer. These two acoustic differences, among others, would seem to contribute to the speakers sounding unnatural when imitating vocal fry. So, when listeners judge the female speakers with vocal fry as sounding "untrustworthy", there is a good possibility that they are simply making such a judgment based on the speaker not sounding like herself. The better lesson that one might take home instead here is that one's job prospects are harmed if you try to talk (or act) like someone who you are not.
Anderson, R. C., Klofstad, C. A., Mayew, W. J., and Venkatachalam, M. (2014) Vocal fry may undermine the success of young women in the labor market. PLOS ONE 9(5): 1-8.
Redi, L. and Shattuck-Hufnagel, S. (2001) Variation in the realization of glottalization in normal speakers. Journal of Phonetics 29:407-429.
Yuasa, I. P. (2010) Creaky voice: a new feminine voice quality for young urban-oriented upwardly mobile American women? American Speech 85(3):315–337.
The above is a guest post by Christian DiCanio.