Fillers: Autism, gender, and age

« previous post | next post »

K. Gorman et al., "Children's Use of Disfluencies Distinguish ASD and Language Impairment", IMFAR 2014 (emphasis added):

This study compares the relative frequencies of "uh" and "um" in the spontaneous speech of children with ASD (with or without comorbid language impairment) to two control groups. Methods: Participants: 112 children ages 3;10–9;0 participated: ASD (50), Specific Language Impairment (SLI;  18), and Typical Development (TD; 44). All diagnoses were verified by best-estimate clinical consensus. The children with ASD were split into two groups: one group with comorbid language impairment (ALI) as diagnosed by a CELF Core Language Score below 85, and one group with ASD but no clinical language impairment (ALN). All children were high functioning monolingual English speakers. Data collection: a clinician administered the Autism Diagnostic Observation Schedule (ADOS; module 2 or 3) to each child. Sessions were recorded and transcribed.

Results: For all group pairs, diagnosis was uncorrelated with overall (i.e., "uh" + "um") rate of filled pause use. FP choice was analyzed for each comparison set using mixed effects logistic regression, with chronological age, FSIQ, ADOS "activity", and utterance position (utterance-initial vs. non-initial) as covariates. Diagnosis was a significant predictor for ALN/TD (p = .001) and ALI/SLI (p = .038); in both comparisons the ASD group used fewer instances of "um". Diagnosis was non-significant for TD/SLI (p = .888) and ALI/ALN (p = .814). ALI and ALN groups both used "uh" and "um" at an approximately 1:1 ratio, whereas TD and SLI groups used "um" 2 to 3 times more often than "uh". ADOS "activity" and utterance position were also significant predictors of FP choice; remaining covariates were non-significant.

This poster abstract reminded me of two things. The first connection is to a curious fact noted in a LLOG post from eight years ago — "Young men talk like old women",  11/6/2005:

Over the past few years, the Linguistic Data Consortium (LDC) has collected and transcribed a large number of telephone conversations for the purpose of speech recognition research. [Some of this has already been published (sample catalog entries are here and here), and the rest will be published soon. The collection is an interesting basis for some new sorts of linguistic research, in my opinion, and below I present a small example of a suggestive result — about the interaction of age, sex and fluency — that took me about half an hour to produce.

If we take the relative frequency of "uh" as a measure of disfluency, then the graph above shows that

* disfluency (or at least uh-usage) increases with age;
* at a given age, men are more disfluent than women (or at least they use uh more than women).
* As a result, men 20-39 have roughly the same uh/the ratio as women 60-69.

The facts for "um" are quite different:

The graph above shows that

* the frequency of "um" decreases with age;
* at a given age, women use "um" more than men.

Again, the rate of "um" usage for the younger men is almost the same as the rate of "um" usage for the older women.

[The transcripts that I used are from the SwitchboardFisher Part 1, and Fisher Part 2 collections, and comprise about 25 million words in total.]

The second connection is with Simon Baron-Cohen's theory that autism is a sort of testosterone poisoning, an "extreme male brain". See for example Simon Baron-Cohen, Rebecca C. Knickmeyer, and Matthew K. Belmonte, "Sex Differences in the Brain: Implications for Explaining Autism", Science 310 (5749) 819-823, 2005.

I've been publicly skeptical of the "extreme male brain" theory, or at least of its popular presentation: see e.g. "Stereotypes and facts", 9/24/2006; "Language and Identity", 7/29/2007; Is autism the symptom of an 'extreme white brain'?", 3/26/2008; "Innate sex differences: Science and public opinion", 6/20/2008. But the approximate similarity of gender and ASD effects on this particular linguistic variable merits further investigation.

Comparable sex effects on uh vs. um have been reported in other datasets as well, e.g. Gunnel Tottie, "Uh and Um as sociolinguistic markers in British English", International Journal of Corpus Linguistics, 2011:

This study is based on the British National Corpus (BNC) and also takes data from the London-Lund Corpus (LLC) into account. It shows that the so-called filled pauses  er/uh and erm/um are sociolinguistic markers that differentiate between registers of English and along gender, age and socio-economic class. Men, older people and educated speakers use more fillers than women, younger speakers and less educated speakers. Nasalization is used more often by women, younger speakers and more educated speakers. These sociolinguistic factors can probably partly explain the fact that the use of fillers is higher in the LLC and the context-governed sample of the BNC than in the demographic sample of the BNC. It is argued that a more positive view should be taken of fillers as planning signals, or planners, and that their functions should be submitted to careful discourse analytic study. Their recognition as words will facilitate such an undertaking.

Rayme Rogge,  Caylie Mash, Meagan Wilson, Aleesa Bryant,  "Umm….Gender and the Use of Filler Words", online presentation 6/3/2014, reports on a study whose methodology is described as follows:

In their set of 15 male and 15 female subjects, they found that females used more fillers overall than males:

But the relevant thing here is the distribution of fillers by sex, and specifically of uh vs. um:


Eric Acton, "On Gender Differences in the Distribution of um and uh", University of Pennsylvania Working Papers in Linguistics, 2011, adds statistics from a speed-dating corpus (as well as from the Switchboard Corpus, which is one of the collections that I surveyed in the 2005 blog post):

While the so-called “fillers” um and uh share a great deal in the way of interpretation, association, and usage, they are far from perfect substitutes. Previous corpus research, focusing primarily on British English, has identified a number of social and discursive factors with which filler usage can vary, including pause length and position in an utterance and speaker age, gender, and social class (Rayson et al. 1997, Clark and Fox Tree 2002, Tottie 2011, inter alia). Building on such research, the present paper investigates social variation in the use of um and uh in the United States. In particular, the paper documents the results of two corpus-based investigations of women’s and men’s usage of um and uh demonstrating that, among the speakers represented in the corpora, women on the aggregate had a far higher ratio of um tokens to uh tokens (um/uh ratio) than did men. The first of the two corpora examined is a collection of 992 transcripts from three speed-dating events held for graduate students at an American university in 2005. In this corpus, women’s average um/uh ratio is more than 3.5 times that of men. An analysis of gendered filler usage in the Switchboard Corpus (SWBC) yields a similar result: women’s average um/uh ratio in the SWBC is more than 2.5 times that of men. Data from the SWBC likewise suggest that this general trend persists across age groups and major U.S. dialect regions and, furthermore, tends to hold for speakers regardless of the gender of their interlocutors. The SWBC also provides evidence suggesting that um is gaining currency relative to uh; i.e., that there is a linguistic change in progress whereby the use of um relative to uh is on the rise. It is noted that not all men and women in the corpora exhibit filler usage in line with the aggregate-level trends, and that gendered linguistic differentiation should not be assumed to be a direct reflection of gender per se (Eckert 1989). A thorough understanding of the dynamics of gender and filler usage calls for an examination of the meanings and associations of um and uh and of speakers’ stances, objectives, and relation to their social world.

I haven't found any studies that are inconsistent with the sex difference in uh vs. um usage, though some studies lump the two together, e.g. the classic paper by Heather Borfeld et al., "Disfluency Rates in Conversation: Effects of Age, Relationship, Topic, Role, and Gender", Language and Speech 2001:

After reviewing situational and demographic factors that have been argued to affect speakers’ disfluency rates, we examined disfluency rates in a corpus of task-oriented conversations (Schober & Carstensen, 2001) with variables that might affect fluency rates. These factors included: speakers’ ages (young, middle-aged, and older), task roles (director vs. matcher in a referential communication task), difficulty of topic domain (abstract geometric figures vs. photographs of children), relationships between speakers (married vs.strangers), and gender (each pair consisted of a man and a woman). Older speakers produced only slightly higher disfluency rates than young and middle-aged speakers. Overall, disfluency rates were higher both when speakers acted as directors and when they discussed abstract figures, confirming that disfluencies are associated with an increase in planning difficulty. However, filers (such as uh) were distributed somewhat differently than repeats or restarts, supporting the idea that fillers may be a resource for or a consequence of interpersonal coordination.

Uh and um are also lumped together in Gunnel Tottie, "On the use of uh and um in American English", Functions of Language, 2014, perhaps because of the small size of the dataset used:

This study examines the use of uh and um — referred to jointly as UHM — in 14 conversations totaling c. 62,350 words from the Santa Barbara Corpus of Spoken American English. UHM was much less frequent than in British English with 7.5 vs. 14.5 instances per million words in the British National Corpus. However, as in British English the frequency of UHM was closely correlated to extra-linguistic context. Conversations in non-private environments (such as offices and classrooms) had higher frequencies than those taking place in private spaces, mostly homes. Time required for planning, especially when difficult subjects were discussed, appeared to be an important explanatory factor. It is clear that UHM cannot be dismissed as mere hesitation or disfluency; it functions as a pragmatic marker on a par with wellyou know, and I mean, sharing some of the functions of these in discourse. Although the role of sociolinguistic factors was less clear, the tendencies for older speakers and educated speakers to use UHM more frequently than younger and less educated ones paralleled British usage, but contrary to British usage, there were no gender differences.

And uh and um are likewise counted together in Charlyn Laserna, Yi-Tai Seih, James Pennebaker, "Um … Who Like Says You Know: Filler Word Use as a Function of Age, Gender and Personality”, Journal of Language and Social Psychology June 2014:

Filler words (I mean, you know, like, uh, um) are commonly used in spoken conversation. The authors analyzed these five filler words from transcripts recorded by a device called the Electronically Activated Recorder (EAR), which sampled participants’ language use in daily conversations over several days. By examining filler words from 263 transcriptions of natural language from five separate studies, the current research sought to clarify the psychometric properties of filler words. An exploratory factor analysis extracted two factors from the five filler words: filled pauses (uh, um) and discourse markers (I mean, you know, like). Overall, filled pauses were used at comparable rates across genders and ages. Discourse markers, however, were more common among women, younger participants, and more conscientious people. These findings suggest that filler word use can be considered a potential social and personality marker.

However, Laserna et al. bring  in the important idea that "filler word use can be considered a potential social and personality marker" (though Bortfield et al. 2001 noted "the idea that fillers may be a resource for or a consequence of interpersonal coordination"). There's been some interesting media uptake for the Laserna et al. paper, e.g.  Adam Gopnik, "The Conscientiousness of Kidspeak", The New Yorker 7/20/2014; Mark Shrayber, "Saying 'Uh,' 'Um,' And 'Like' Means You're Literally An Awesome Person", Jezebel 7/22/2014.

The pattern noted in my 2005 blog post might be taken to suggest a lexical change in progress, with um taking over from uh, and women leading the change as they often do. However, other aspects of the available data suggest to me that this is wrong, and that the true explanation lies elsewhere, in a stable difference in function between these two fillers.

That's all I have time for this morning — but in a later post, I'll take up the question of what that difference is, and how it's related to the observed differences in who uses which filler-word when.






  1. Cynthia McLemore said,

    July 30, 2014 @ 10:14 am

    Some female speakers in Texas — and elsewhere, I'm sure, but limiting it to my data — use a noticeably fronted vowel in "um" that carries social information. I doubt that would account for the overall frequency difference between men and women, but without specifying vowel shape and social characteristics you'd miss the whole point of "um" in certain datasets.

  2. S T said,

    July 30, 2014 @ 10:52 am

    I wonder whether any research on the use of "um" and "uh" by men and women take into account that "uh" leaves the speaker's mouth open and "um" leaves it closed. I suspect there's an influence of how each mouth position looks and how socially acceptable it is to keep one's mouth open that pushes women to use "um" more than "uh."

  3. Eric P Smith said,

    July 30, 2014 @ 12:20 pm

    I have ASD. As a young child I was an inveterate "ummer" but it was successfully, and very easily, coached out of me without (so far as I am aware) any adverse consequences. In my case it was undoubtedly a disfluency rather than a planner.

  4. BZ said,

    July 31, 2014 @ 9:22 am

    Can you really reliably tell the difference between "uh" and "um" in phone conversation, or even live, without looking at the speaker's mouth?

  5. Kyle Gorman said,

    July 31, 2014 @ 11:43 am

    @Eric P Smith: that's very interesting…who was doing this coaching? I was told that speech/language pathologists sometimes coach children _to_ use "um" under some circumstances.

    @BZ: I am the first author of the paper. An RA conducted an retrospective interannotator agreement study to look at your concern (this is all in the MS we're submitting to an autism journal). After the primary transcription efforts were complete, we randomly selected 4 utterances per child, 2 of which had been coded as containing an "uh" or an "um", and 2 of which had been coded as containing no fillers. Two transcribers who had not been part of the initial transcription efforts listened to audio clips of these utterances and then provided a transcription using the same guidelines as the original transcribers. They agreed with the original transcriber about whether it was an "uh" or an "um" over 90% of the time. I think the harder problem is determining whether it's "uh" or the indefinite article "a", though they are said have different prosodic characteristics (such as duration).

RSS feed for comments on this post