Andy Schwartz recently gave me a copy of word counts by sex and age for the Facebook posts from the PPC's World Well-Being Project. So I thought I'd compare some of the Facebook counts to data from the LDC's archive of conversational speech transcripts. As a start, here's a comparison of rates of pronoun usage in the PPC Facebook sample and in the transcripts of the LDC's Fisher English datasets (combining Part 1 and Part 2).
Archive for Language and gender
From John Coleman:
Inspired by your recent Language Log pieces, I tried an analysis of "er" vs "erm" in the Spoken BNC. These are the two main transcriptions for filled pauses labelled as "UNC" in the Claws-5 tagset and also "UNC" in the richer set of pos labels used in BNC. I.e. they are distinguished from items labelled as ITJ / INTERJ, in which the few tokens of "uh" and "um" are classified. These "uh"s are almost all in "uh huh" meaning "yes", and many of the "um"s and "mm"s are also in contexts where the "yes" sense is clear. So I disregarded the ITJs and restricted the analysis to UNC "er" and "erm", which are far more numerous in any case. As these are mostly nonrhotic dialects one can interpret "erm" as just schwa + nasality, with no implication of rhoticity; ditto for "er".
In previous posts about filled pauses, we've seen a consistent and large sex difference: women use (what's transcribed as) "um" somewhat more than men do, and men use (what's transcribed as) "uh" a lot more than women do. This pattern has been found in two large conversational telephone speech corpora involving a mix of ages and American regions, in a collection of undergraduate speed-dating transcripts, in a collection of undergraduate "tell me about your weekend" interviews, and in a collection of several hundred sociolinguistic interviews collected over a period of four decades in Philadelphia.
There are apparently also effects of age, of region, of time period, of years of education, of Autism diagnosis, and so on. Today I'll add one more geographical data point – young adults from the Glasgow area — and one more variable — friends vs. strangers.
In a ten-year-old LLOG post ("Gender and tags" 5/9/2004), I cited "the complexity of findings about language and gender, where published claims sometimes contradict one another, and where the various things that 'everybody knows' are not always confirmed by experiment", and warned that
This happens in every area of rational inquiry, but it's especially common in cases where generalizations are associated with strong feelings. In this case, we're talking about the nature of men and women as biological and social categories, and the way individual men and women interact in both private and public spheres. There aren't many topics that generate stronger feelings than this one.
Strong feelings tend to generate contradictory research for two obvious reasons. First, systematic observation sometimes fails to confirm evocative anecdotes, which may be evocative because they resonate with stereotypes rather than because they genuinely confirm experience. Second, even systematic observation can be misleading, if you don't make the right observational distinctions or don't control for the context in an appropriate way. When the emotional stakes are high, people should in principle be especially careful not to overinterpret or overgeneralize their findings, but in practice, the opposite is often true.
I've recently posted several times on sex differences in filled-pause usage: "Fillers: Autism, gender, and age" 7/30/2014; "More on UM and UH" 8/3/2014; "UM UH 3"8/4/2014. This morning's post will try to put this issue into the context of other statistical tendencies in gendered word usage, and to point out the wide range of possible explanations for the differences.
[Warning: More than usually wonkish and quantitative.]
In two recent and one older post, I've referred to apparent gender and age differences in the usage of the English filled pauses normally transcribed as "um" and "uh" ("More on UM and UH", 8/3/2014; "Fillers: Autism, gender, and age", 7/30/2014; "Young men talk like old women", 11/6/2005). In the hope of answering some of the many open questions, I decided to make a closer comparison between the Switchboard dataset (collected in 1990-91) and the Fisher dataset (collected in 2003).
A few days ago ("Fillers: Autism, gender, and age" 7/30/2014), I noted an apparent similarity between male/female differences in UM/UH usage, and an autistic/typical difference reported in a poster by Gorman et al. at the IMFAR 2014 conference.
This morning I thought I'd take a closer look at the patterns in a large published conversational-speech dataset. Executive summary:
- There is a large sex difference in filled-pause usage, favoring males by about 38%
- There is an enormous sex difference in UM/UH ratio, favoring females by about 310%
- These sex differences are mainly driven by the difference in UH usage, which favors males by about 250%
- Older speakers use UH more and UM less, resulting in a large decrease of UM/UH ratios
The general pattern of gendered filled-pause usage in English has been at least partly replicated in several other datasets, including the spoken part of the British National Corpus, but the details are sometimes quite different. (See my earlier post, and planned future posts, for some discussion.) But all the important questions remain open, for example:
- Are the sex effects due to functional, iconic, or physiological differences between UM and UH, or are they arbitrary gender markers?
- Do the age effects reflect a change in progress, or a life-cycle effect (e.g. due to changes in sex hormone levels)?
- Are the patterns the same or different across geographical, socio-economic, and ethnic varieties of English?
- Are there analogous phenomena in other languages?
K. Gorman et al., "Children's Use of Disfluencies Distinguish ASD and Language Impairment", IMFAR 2014 (emphasis added):
This study compares the relative frequencies of "uh" and "um" in the spontaneous speech of children with ASD (with or without comorbid language impairment) to two control groups. Methods: Participants: 112 children ages 3;10–9;0 participated: ASD (50), Specific Language Impairment (SLI; 18), and Typical Development (TD; 44). All diagnoses were verified by best-estimate clinical consensus. The children with ASD were split into two groups: one group with comorbid language impairment (ALI) as diagnosed by a CELF Core Language Score below 85, and one group with ASD but no clinical language impairment (ALN). All children were high functioning monolingual English speakers. Data collection: a clinician administered the Autism Diagnostic Observation Schedule (ADOS; module 2 or 3) to each child. Sessions were recorded and transcribed.
Results: For all group pairs, diagnosis was uncorrelated with overall (i.e., "uh" + "um") rate of filled pause use. FP choice was analyzed for each comparison set using mixed effects logistic regression, with chronological age, FSIQ, ADOS "activity", and utterance position (utterance-initial vs. non-initial) as covariates. Diagnosis was a significant predictor for ALN/TD (p = .001) and ALI/SLI (p = .038); in both comparisons the ASD group used fewer instances of "um". Diagnosis was non-significant for TD/SLI (p = .888) and ALI/ALN (p = .814). ALI and ALN groups both used "uh" and "um" at an approximately 1:1 ratio, whereas TD and SLI groups used "um" 2 to 3 times more often than "uh". ADOS "activity" and utterance position were also significant predictors of FP choice; remaining covariates were non-significant.
Below is a guest post by Kieran Snyder, taken with permission from her always-interesting tumblr Jenga one week at a time.
About a month ago at work I overheard one woman complaining to another woman about a man’s habit of interrupting everyone in meetings. Then they went further. “That’s just how it is around here. The women listen, but the men interrupt in meetings all the time,” one of them summed it up.
As a moderate interrupter myself – I’m sorry if I’ve interrupted you, I just get excited about what you’re saying and I want to build on it and I lose track of the fact that it’s not my turn and I know it’s a bad habit – I started wondering if she was right. Do men interrupt more often than women?
In "The future of singular they" (3/8/2013), I noted that some people assign the traditional English pronouns he, she, they (and it?) in non-traditional ways, depending on the preferences of the person referred to rather than on the traditional criteria of number, animacy, and primary sexual organs. And the number of conceptual categories involved is potentially much larger than four, as discussed in "58 Facebook genders" (2/18/2014).
Ann Leckie's 2013 novel Ancillary Justice depicts a situation in which the traditional relationships of language and gender are modified in an interestingly different way.
Facebook Diversity 2/13/2014:
When you come to Facebook to connect with the people, causes, and organizations you care about, we want you to feel comfortable being your true, authentic self. An important part of this is the expression of gender, especially when it extends beyond the definitions of just “male” or female.” So today, we’re proud to offer a new custom gender option to help you better express your own identity on Facebook.
We collaborated with our Network of Support, a group of leading LGBT advocacy organizations, to offer an extensive list of gender identities that many people use to describe themselves. Moreover, people who select a custom gender will now have the ability to choose the pronoun they’d like to be referred to publicly — male (he/his), female (she/her) or neutral (they/their).
We also have added the ability for people to control the audience with whom they want to share their custom gender. We recognize that some people face challenges sharing their true gender identity with others, and this setting gives people the ability to express themselves in an authentic way.
Some useful framing for the Ingalhalikar et al. paper I wrote about earlier today – Christian Jarrett, "Getting in a Tangle Over Men's and Women's Brain Wiring", Wired 12/4/2013:
[L]et’s set this new brain wiring study in the context of previous research. Verma and her team admit that a previous paper looking at the brain wiring of 439 participants failed to find significant differences between the sexes. What about studies on the corpus callosum – the thick bundle of fibres that connects the two brain hemispheres? If women really have more cross-talk across the brain, this is one place where you’d definitely expect them to have more connectivity. And yet a 2012 diffusion tensor paper found “a stronger inter-hemispheric connectivity between the frontal lobes in males than females”. Hmm. Another paper from 2006 found little difference in thickness of the callosum according to sex. Finally a meta-analysis from 2009: “The alleged sex-related corpus callosum size difference is a myth,” it says.
OK, one last thing. I don’t know if you saw it, but earlier this year another study involving hundreds of participants used a different technique (resting state fMRI) to examine connectivity in the brain, this time for the purpose of seeing if some people have more left-brain functional hubs and others have more right-brained hubs (they don’t). This obviously isn’t the same focus as the new PNAS paper, but if men and women’s brains really are wired up differently to optimise them for map reading or multitasking etc, you’d think there’d be some important sex differences in the way functional hubs are lateralised (distributed to one side of the brain or the other). In fact, “no differences in gender were observed,” the authors said.
In conclusion – Wow, those are some pretty wiring diagrams! Oh … shame about the way they interpreted them.
One of the first things a student learns when studying Mandarin is the third person pronoun, tā. This was originally written 他 (with "human" radical), and it stood for feminine, masculine, and neuter — "he", "she", and "it". During the early 20th century, however, some bright folks — undoubtedly in emulation of European languages — thought it would be a good idea to introduce gender into the Chinese writing system, so 她 (with "female" radical) came to be used for the feminine and 它 (with "roof" radical) for the neuter. I always thought that rather odd, because no attempt was made to differentiate the three forms in speech, only in writing, hence 他, 她, and 它 were still all pronounced tā.
Well, it's not quite right to say that no attempt was made to differentiate the three forms in pronunciation, since there was a half-hearted effort to introduce yī for feminine and tuō for neuter, but it didn't catch on.
Beyond 他, 她, and 它, there are also 牠 (with "bovine" radical) for animals and 祂 (with "spirit" radical) for deities, etc. All of these were — and still are — pronounced tā.