Language Log

Male and female word usage

August 7, 2014 @ 7:35 am · Filed by Mark Liberman under Language and gender

In a ten-year-old LLOG post ("Gender and tags" 5/9/2004), I cited "the complexity of findings about language and gender, where published claims sometimes contradict one another, and where the various things that 'everybody knows' are not always confirmed by experiment", and warned that

This happens in every area of rational inquiry, but it's especially common in cases where generalizations are associated with strong feelings. In this case, we're talking about the nature of men and women as biological and social categories, and the way individual men and women interact in both private and public spheres. There aren't many topics that generate stronger feelings than this one.

Strong feelings tend to generate contradictory research for two obvious reasons. First, systematic observation sometimes fails to confirm evocative anecdotes, which may be evocative because they resonate with stereotypes rather than because they genuinely confirm experience. Second, even systematic observation can be misleading, if you don't make the right observational distinctions or don't control for the context in an appropriate way. When the emotional stakes are high, people should in principle be especially careful not to overinterpret or overgeneralize their findings, but in practice, the opposite is often true.

For some striking examples, see LLOG coverage of Leonard Sax or Louann Brizendine.

I've recently posted several times on sex differences in filled-pause usage: "Fillers: Autism, gender, and age" 7/30/2014; "More on UM and UH" 8/3/2014; "UM UH 3"8/4/2014. This morning's post will try to put this issue into the context of other statistical tendencies in gendered word usage, and to point out the wide range of possible explanations for the differences.

[As in the UM~UH posts, my sources are the Switchboard dataset (collected in 1990-91) and the Fisher dataset (collected in 2003). I've relied on cases where the sex assigned to a speaker by auditors listening to a recorded conversation agreed with the sex given by that nominal speaker when he or she signed up for the study. Most of the small number of disagreements between these indicators are examples where someone other than the designated subject answered the phone and participated in the call. This yields 520 speakers for Switchboard and 10,401 speakers for Fisher.]

And in order to quantify the gender association of individual words, I've used the "weighted log-odds-ratio, informative dirichlet prior" algorithm [Monroe et al., "Fightin Words", Political Analysis 2008] previously discussed in "Obama's favored (and disfavored) SOTU words" 1/29/2014.

Without further ado, here are the 25 most female-associated textual tokens in the Fisher corpus, according to that metric (note of course that a different metric would give a somewhat different list). The four columns are the "word", the frequency per million words for males, the frequency per million words for females, and the weighted log-odds:

Word	Male/MW	Female/MW	LogOdds
[laughter]	6762	10786	-72.0
mhm	3083	4981	-49.8
husband	39	602	-48.9
and	25475	29646	-43.4
my	4611	6503	-42.6
oh	6083	8150	-41.3
she	1702	2781	-37.9
we	5362	6943	-33.9
um	8133	9885	-31.2
have	6899	8457	-30.1
kids	619	1146	-29.3
he	3313	4370	-28.6
her	628	1134	-28.2
children	198	520	-27.8
because	2505	3360	-26.5
so	9149	10670	-25.9
yes	1086	1651	-25.4
daughter	78	288	-25.0
gosh	55	243	-24.6
goodness	30	169	-22.1
son	101	284	-21.5
home	472	784	-20.7
too	1747	2278	-19.9
wow	623	939	-18.9
uh-huh	1487	1946	-18.6

It's pretty clear why the women in this collection would use husband, kids, children, daughter, son, home more than the men do. It's also pretty clear why they use gosh 4.4 times as often, and goodness 5.6 times as often — but the obvious explanation is a different one.

It's less clear why women should laugh 60% more often than men do — are women on average happier, or more overtly sociable? Or do men feel constrained not to express positive emotions?

Does the greater female propensity to use mhm, yes, and uh-uh reflect an overall larger proportion of so-called "back-channel" responses? Or just a different choice of back-channel words, compared to (say) the more male-associated yeah, no shit, etc.?

And why in the world should women use and 16% more frequently than men?

Here are the 25 most male-associated lexical tokens in the Fisher corpus, again according to the specified metric:

Word	Male/MW	Female/MW	LogOdds
uh	11863	4651	137.6
ah	5364	2622	74.8
yeah	20468	17688	34.8
mean	5125	3882	31.7
you	34208	31086	30.2
wife	284	62	29.2
[noise]	8574	7146	27.6
man	482	197	26.6
hey	355	130	24.9
pretty	1381	904	24.1
the	25869	23767	23.2
a	20040	18253	22.3
of	13976	12624	20.1
))	12986	11701	19.8
((	12986	11701	19.8
shit	98	15	19.0
sort	629	380	19.0
cool	604	374	17.8
i-	458	265	17.4
like	14381	13194	17.3
what	6806	5999	17.2
guy	430	247	17.0
there	5753	5034	16.7
th-	373	210	16.4
bucks	165	63	16.4

Again, there's an obvious reason for men to use words like wife more than women do. But it's less clear what the explanation for you and the should be.

The (( and )) tokens are the start and end of regions that transcribers were unsure of or found completely unintelligible — thus male speakers in this collection were unintellible 11% more often than female speakers. The hyphens at the end of i- and th- represent "false starts", words that were cut off and replaced by a self-correction — overall, male speakers in this collection had 45% more false starts than female speakers did (7815 per MW vs. 5478 per MW). This suggests somewhat greater disfluency overall, which would be consistent with uh being a symptom of disfluency in a way that um isn't.

But the same question comes up here as in respect to the different in laughter: Does the apparent sex difference in markers of disfluency really reflect a difference in underlying capabilities, or does it indicate a difference in self-presentation?

We can pose an analogous question again with respect to the well-known gender difference in taboo word usage, strikingly evident in the word clouds for female vs. male Facebook posts in H. Andrew Schwartz et al., "Personality, Gender, and Age in the Language of Social Media: The Open-Vocabulary Approach", PLoS ONE 2013.

Presumably this reflects a gender difference in how people are socialized to express the emotions and attitudes underlying taboo-word usage, not an intrinsic difference (whether genetic or learned) in how men and women feel and react. Many people will be more open to intrinsic-difference explanations for gendered variation in frequency of laughter or of disfluency markers. But I believe that we should keep an open mind in all of these cases.

Some relevant earlier LLOG posts:

"What men and women blog about" 7/8/2007
"Cute" 10/24/2009

And for those who want more, here are the full vocabulary lists for Switchboard and Fisher, sorted by male-vs.-female weighted log-odds.

August 7, 2014 @ 7:35 am · Filed by Mark Liberman under Language and gender

Permalink

11 Comments

Rubrick said,

August 7, 2014 @ 4:31 pm

It's also pretty clear why they use gosh 4.4 times as often, and goodness 5.6 times as often — but the obvious explanation is a different one.

You lost me a bit here. I can't think of any truly obvious explanation for "gosh" and "goodness" beyond "those are girly words", which would seem to beg the question.

[(myl) Sorry to be obscure. It's well established, both by empirical studies and by common sense, that men in general "cuss" or "use profanity" much more frequently than women do, and that women are more likely than men to substitute euphemisms instead.]
Michael Watts said,

August 7, 2014 @ 4:35 pm

Presumably this reflects a gender difference in how people are socialized to express the emotions and attitudes underlying taboo-word usage, not an intrinsic difference (whether genetic or learned) in how men and women feel and react.

Imagine the following experiment (and results):

—
A cohort of infants are procured from somewhere and raised in a Truman-show type environment, successfully socialized such that the girls take on "typical male" behavior like swearing, and the boys take on "typical female" behavior, etc. This is said to demonstrate that sex-behavior interactions are the product of socialization.

The cohort is kept in a closed environment, and three generations later, with no interference from "outside", the females show "typical outside female" behavior and the males show "typical outside male" behavior.
—

I believe that similar effects have been documented in birds, where a generation of birds whose song is defective, because they didn't learn to sing properly growing up, nevertheless produce offspring with normal song. The imaginary experiment would seem to be a success in showing that sex-behavior interactions reflect socialization, even to the point that socialization can reverse them, but does it also show that they aren't intrinsic differences?
Michael Watts said,

August 7, 2014 @ 5:27 pm

@Rubrick: the obvious explanation for women using "gosh" and "goodness" at an elevated rate over men is that those are euphemized swear words, and girls don't like swearing.
J. Goard said,

August 7, 2014 @ 6:22 pm

It's less clear why women should laugh 60% more often than men do — are women on average happier, or more overtly sociable? Or do men feel constrained not to express positive emotions?

Maybe higher-pitched laughter is just a lot more noticeable to the transcriber?

[(myl) Perception differences of this kind are certainly among the explanations that need to be considered. But I don't think it can just be a matter of pitch — male use of sounds categorized as [noise] is about 20% greater than female use.]
D.O. said,

August 7, 2014 @ 6:45 pm

Is there any way to look at inside the group variances? It should provide some sort of scale to the effect size.

[(myl) You can look at things like the "violin plots" given here. But it's a problem that the counts for individual words are not very stable in per-person samples of approximate 1,000 words per conversation — though presumably there would be a lot of individual variation in larger samples as well, and also individual variation in situational variation, if you see what I mean.]
D.O. said,

August 7, 2014 @ 10:47 pm

Yes, that's what I meant. And for many words from both high-odds-ratio lists there should be a few instances in the average 1000-word scoop, but OK, I don't think anymore that it will clear anything up. Violin plots help.

Laughter, uh, and ah are real outliers. Without them the distribution is a bell shape with fat tails. But why functional words are on the lists? Men and women prefer to use different types of determiners?..
Kyle Gorman said,

August 8, 2014 @ 2:56 pm

Hi Mark, just read the Monroe et al. paper. Really interesting stuff; I'd like to try to replicate these results. What corpus did you use for the "informative prior"? And, what value of \alpha_0 (the shrinkage hyperparameter) did you use?
Elonkareon said,

August 8, 2014 @ 3:21 pm

I wonder… the personal third person pronouns (i.e. excluding it) are more commonly used by females than males, while determiners (used to preface _things_ rather than _persons_, or for that matter even _ideas_) are more commonly used by men. I wonder what the relative frequency of proper names (considered generally, as [name] rather than Scarlet or Phil) is between the sexes?
Elonkareon said,

August 8, 2014 @ 5:47 pm

After a brief glance at the Fisher list looking for proper names, both men and women use them (hard to say how often since each name is mentioned separately), but women tend to speak about other women, and men about other men.
Tyler Schnoebelen said,

August 8, 2014 @ 6:00 pm

Hey Mark, did you happen to take a look at how things change if you work on particular topics and if you take into consideration the gender of the person being spoken to?

The last time I asked gender-related questions of the Fisher dataset, I found that by-and-large, men and women were doing similar things in most topics. What changes things most significantly is something like what's considered face-threatening and that's a bit of an intersection between interlocutor-gender and topic.

Another way of asking these questions is to see how different gender styles emerge by not assuming "males" and "females" as two groups. Along those lines, here are some links of work with David Bamman and Jacob Eisenstein on Twitter, words, and social networks: http://onlinelibrary.wiley.com/doi/10.1111/josl.12080/abstract. Or the presentation version: http://www.slideshare.net/TylerSchnoebelen/gender-and-language-linguistics-social-network-theory-twitter.
D.O. said,

August 8, 2014 @ 6:42 pm

I decided to look at this purported "determiner conservation" law. Namely, whether there is really a trade-off between articles and possessive personal pronouns. Here are the results
word | count M | count F | freq. M | freq. F| freq. all | z-score
the 262905 308649 25.9 23.8 24.7 32.3
a    203665 237041 20.0 18.3 19.0 31.2
an    14166   16775 1.4 1.3 1.3 6.7
my    46859   84455 4.6 6.5 5.7 -60.2
your 21471 27906 2.1 2.1 2.1 -1.9
his      5865     9154 0.6 0.7 0.6 -12.0
her      6387   14729 0.6 1.1 0.9 -40.0
our     7993    13671 0.8 1.1 0.9 -20.8
their 13454    18822 1.3 1.4 1.4 -8.0
its 748 833 0.1 0.1 0.1 2.7

total 583513 732035 57.4 56.4 56.8 10.8

Sorry for stupid formattting. Frequencies are per 1000 words. I prefer z-scores to log-odds, but there is almost no qualitative difference.

So, yes. There seems to be some sort of compensation between articles and my/her/our. It's not clear how to treat indefinite articles. I don't know whether they can hardly be traded-off for possessives. The best way to check the hypothesis is to go beyond M/F split and treat everybody equally. Even harder it is to check that more possessives mean more personal style/interests of female speakers. Somebody would need to go out and ascertain the topics of conversations.

RSS feed for comments on this post

Male and female word usage

11 Comments

Rubrick said,

Michael Watts said,

Michael Watts said,

J. Goard said,

D.O. said,

D.O. said,

Kyle Gorman said,

Elonkareon said,

Elonkareon said,

Tyler Schnoebelen said,

D.O. said,

Follow us on Twitter

Archives [+/–]

Blogroll [+/–]

Meta