Language Log

Sexual accommodation

December 30, 2011 @ 8:42 am · Filed by Mark Liberman under Computational linguistics, Language and gender

You've probably noticed that how people talk depends on who they're talking with. And for 40 years or so, linguists and psychologists and sociologists have referred to this process as "speech accommodation" or "communication accommodation" — or, for short, just plain "accommodation". This morning's Breakfast Experiment™ explores a version of the speech accommodation effect as applied to groups rather than individuals — some ways that men and women talk differently in same-sex vs. mixed-sex conversations.

I got the idea of doing this a couple of days ago, as I was indexing some conversational transcripts in order to find material for an experiment on a completely different topic. The transcripts in question come from a large collection of telephone conversations known as the "Fisher English" corpus, collected at the LDC in 2003 and published in 2004 and 2005. These two publications together comprise 11,699 two-person conversations, involving a diverse collection of speakers. While the sample is not demographically balanced in a strict sense, there is a good representation of speakers from all over the United States, across a wide range of ages, educational levels, occupations, and so forth.

As the documentation explains,

Under the Fisher protocol, a very large number of participants each make a few calls of short duration speaking to other participants, whom they typically do not know, about assigned topics. This maximizes inter-speaker variation and vocabulary breadth although it also increases formality. […]

To encourage a broad range of vocabulary, Fisher participants are asked to speak on an assigned topic which is selected at random from a list.

You can see a sample transcript here. In that case, the randomly-assigned topic was

If an unknown benefactor offered each of you a million dollars – with the only stipulation being that you could never speak to your best friend again – would you take the million dollars?

You can see one of the lists of topics used here. The language of these conversations is typical of small-talk among strangers, as you would expect from the way the conversations were set up.

Anyhow, there were 3,143 male/male conversations, comprising 6,322,608 words; 4,736 female/female conversations, comprising 9,330,364; and 3,820 mixed-sex conversations, comprising 7,496,547 words, of which 3,840,493 came from the male participants, and 3,656,054 came from the female participants.

This is enough material to get decent estimates for the rate of use of many words, not only overall but also in variously-defined slices of the collection. In particular, it occurred to me, we can look at how often certain words are used by men and women in same-sex vs. mixed-sex collections.

	Same Sex		Mixed Sex
	M (count)	F (count)	M (count)	F (count)
*daughter*	380	3043	410	696
*son*	581	2014	442	872
*engineering*	166	39	70	42
*actor*	218	73	76	56

These raw counts are hard to compare, since the overall number of words varies from cell to cell, e.g. from 3.7 million in the case of women in mixed-sex conversations, to 9.3 million for the case of women in same-sex conversations. The obvious thing to do is to normalize the counts to proportional frequencies, dividing them by the total number of words for the cell in question. The resulting numbers will have a lot of zeros after the decimal point, e.g. (for female uses of son in mixed-sex conversations)
872/3656054 = 0.0002385085,
so it's convenient to express the frequency in terms of the expected incidence per million words, which in this case is
1000000*872/2545054 = 238.509,
or a frequency of 239 per million words if we round to a reasonable number of digits.

Hwre's the same table with the counts expressed as frequencies per million words:

	Same Sex		Mixed Sex
	M (per MW)	F (per MW)	M (per MW)	F (per MW)
*daughter*	60	326	107	190
*son*	92	302	115	239
*engineering*	26	4	18	11
*actor*	34	8	20	15

We can see a sort of accommodation going on in this table — the frequencies in the mixed-sex case are clearly more similar than they are in the same-sex case.

How can we quantify this similarity? One possibility is to compare ratios (of male and female frequencies) in the two cases. The greatest possible similarity would be a ratio of 1. But in this approach, a frequency difference of 1,000 vs. 100 gives a ratio of 10, whereas a frequency difference of 100 vs. 1,000 gives a ratio of 0.1. In the way we naturally think of numbers — and certainly the way our visual system naturally interprets graphs — 10 seems a lot farther from 1 than 0.1 does.

We can eliminate this asymmetry by taking the log of the ratios. Now perfect similarity is 0, and increasingly dissimilar ratios are symmetrically positive and negative. And again, for presentational convenience, we can scale the results into a convenient range of integers, in this case multiplying by 100 and rounding to the nearest integer.

Adding a column for these scaled log ratios to the table above, we get:

	Same Sex			Mixed Sex
	Male	Female	*100log(M/F)**	Male	Female	*100log(M/F)**
*daughter*	60	326	-169	107	190	-58
*son*	32	302	-119	115	239	-73
*engineering*	26	4	184	18	11	46
*actor*	34	8	148	20	15	26

In each row, we can see that the log ratio in the mixed-sex conversations is closer to zero (i.e. the usage frequencies are more similar) than in the same-sex conversations.

Although these counts are plenty high enough to give us statistically-significant results, the data for such words come from a small to moderate proportion of conversations. Thus it's plausible that men use the word engineering more than women do — but this word was used in only 210 of 11,699 conversations. One way to make a more convincing case for the reality of this group accommodation effect would be to look at the patterns for a lot of words — but another would be to look sex differences in very frequent words.

And as James Pennbaker and others have repeatedly shown over the years, such differences do exist. (For a review, see Matthew Newman et al., "Gender Differences in Language Use: An Analysis of 14,000 Text Samples", Discourse Processes 2008.) In particular, there are modest but significant differences in the relative frequency with which men and women use some pronouns, including some very common ones.

So here's a table, showing pronoun frequency (per million words) for men and women in same-sex vs. mixed-sex conversations from the Fisher English collection, along with the scaled log ratios. I've included less common as well as more common pronouns.

	Same Sex			Mixed Sex
	Male	Female	*100log(M/F)**	Male	Female	*100log(M/F)**
I	42035	43758	-4	42549	43176	-1
me	2441	2929	-18	2544	2779	-9
my	4435	6763	-42	4899	5839	-18
mine	134	245	-60	150	165	-46
*you*	34967	30579	13	32958	32379	2
*your*	2082	2118	-2	2164	2226	-3
yours	29	42	-36	35	33	8
he	3363	4521	-30	3230	3984	-21
him	665	982	-39	650	847	-27
his	581	723	-22	571	659	-14
she	1589	2992	-63	1889	2242	-17
her	593	1223	-72	687	908	-28
hers	4	10	-98	4	8	-58
it	25158	24712	2	24569	24336	1
its	78	63	20	67	66	1
we	5209	7177	-32	5615	6343	-12
us	563	775	-32	621	718	-14
our	767	1088	-35	819	962	-16
ours	20	39	-67	22	30	-29
they	11860	12933	-9	12535	12741	-2
them	1834	2233	-20	1944	2150	-10
their	1292	1445	-11	1377	1460	-6
theirs	7	11	-40	8	9	-20

In 22 out of 23 cases, the frequencies in the mixed-sex cases are more similar (log ratio closer to zero) than in the same-sex cases. Here's a graphical presentation of the same data:

Since these are relatively short conversations, most likely this effect represents speakers taking a different approach to the topic, rather than speakers adjusting their vocabulary based on their interlocutor's word usage. (We could explore this by looking at rates in the first and second halves of the conversations — but that's an experiment for some other breakfast hour…)

Another area where there are well-documented sex differences in word frequency is cussing. Some examples from the same data set:

	Same Sex			Mixed Sex
	Male	Female	*100log(M/F)**	Male	Female	*100log(M/F)**
gosh	42	267	-184	77	184	-87
goodness	22	186	-216	43	124	-104
shit	135	11	251	37	25	39
hell	106	24	149	58	47	22
damn	63	12	167	29	26	13
fucking	39	2	302	8	8	5
ass	32	4	205	11	11	2
jesus	21	7	105	26	20	43
pissed	21	5	149	15	11	36
fuck	21	1	309	7	5	32
christ	12	5	89	9	8	8
piss	6	2	121	3	4	-31

And again, the results in graphical form:

December 30, 2011 @ 8:42 am · Filed by Mark Liberman under Computational linguistics, Language and gender

Permalink

13 Comments

Andy Averill said,

December 30, 2011 @ 9:36 am

I'm not sure what conclusion to draw from the the fact that the ratios for "we" and "us" are exactly reversed in same sex conversations. Are women more nominative and men more accusative?

[(myl) Actually, the conclusion you should draw is that I transcribed the results backwards for that row. Fixed now.]
Jennifer Pardo said,

December 30, 2011 @ 11:12 am

Thanks for the post on this topic, I find these data very interesting. Not sure if you're aware of a study of same vs. mixed sex accommodation by Bilous & Krauss (1988). Dominance and accommodation in the conversational behaviours of same- and mixed-gender dyads. Language & Communication, 8, 183-194. They measured a somewhat different constellational attributes: total# words, freq interruptions, avg utterance length, freq short and long pauses, freq back-channels, and freq laughter. They found that males and females converged on some attributes at the same time that they diverged on other attributes when switching from same- to mixed-sex dyads. For example, female talkers converged toward male partners' usage in terms of total number of words and number of interruptions, but diverged from male partners' usage on freq of back-channels and laughter. It looks like the same sort of thing is going on here. Some words are showing convergence while others stay flat, and cussing shows greater convergence overall than most pronouns. Very cool.

[(myl) I don't have coding in this collection for back-channels and interruptions, but [laughter] also converged slightly in mixed-sex dyads. Frequency of laughter per MW in same-sex conversations was 6,616 for males and 11,075 for females (100*log(M/F) = -52); the same in mixed-sex conversations was 7002 for males and 10,048 for females (100*log(M/F) = -36) Thus the males increased by 6% and the females decreased by 9%.

There are a bunch of differences in the two data sets: telephone vs. face-to-face; many ages, locations, and life circumstances vs. Columbia University undergraduates; free topical discussion vs. problem solving; etc.

What the two sets of results have in common is the demonstration that in drawing conclusions about the communications habits of Group X vs. Group Y, it's important to take account of who they're communicating with (as well as what they're communicating about and for, etc.)]

It is interesting to see that male and female talkers who start out with different frequencies of usage when conversing with same-sex partners end up converging on frequencies of usage when conversing with opposite-sex partners. However, there are lots of interesting ways to analyze these data. For example, it would be useful to know to what extent the talkers were similar to their partners in the same-sex pairings, and whether overall similarity in same-sex pairs differed from that of mixed-sex pairs.

It would also be interesting to know who changed more in their usage (males or females) from the same-sex to the mixed-sex pairings. I did a quick calculation on the data from your tables to compare change in word usage of males and females from same- to mixed-sex pairings. In order to make positive values reflect increased usage, the formula takes the ratio of males in mixed/males in same and females in mixed/females in same. For pronouns, males averaged 4 (9), and females averaged -11 (13). For cussing, males averaged -52 (72), and females averaged 71 (60). As the standard deviations show, these are highly variable across items in the word sets. This is a really interesting interaction. It shows that males and females possibly changed in a subtle manner for pronoun usage, but changed in very different manners for cussing usage, with females increasing usage and males decreasing usage in mixed company.

For cussing males, the only words that showed increases in frequency were gosh, goodness, and jesus; all others showed decreases–perhaps they were substituting these cleaner words for the f-bombs (these decreased the most–Does this mean chivalry is not dead?). For cussing females, most of the words increased (especially the f-bombs) and only gosh and goodness decreased.

Great food for thought today.
L'Esprit de l'Escalier said,

December 30, 2011 @ 2:14 pm

Thanks for publishing this in a tabular format that I was able to copy and paste directly into a blank spreadsheet.

I aggregated your data first by case and then by lexeme.

Men and women are about equal in their overall use of nominative and accusative, but women use more genitives.

Women talk more often about her, about him, and about us (The logarithms, using your formula on the same-sex totals, are -29, -14, -13.)

Women talk slightly more often about me and about them. (-4, -4)

Men talk a tiny bit more often about it and somewhat more often about you. (1, 5)

[(myl) If you'd like a larger slice of the data, I'd be happy to post the complete lexical histograms and some programs for manipulating them…]
Jeremy Kahn (@trochee) said,

December 30, 2011 @ 2:20 pm

Two folks from the SSLI lab at UW Seattle produced a paper from 2005 on a very similar subject, with some similar results:

P05-1054 [bib]: Constantinos Boulis; Mari Ostendorf
A Quantitative Analysis of Lexical Differences Between Genders in Telephone Conversations

(full disclosure: one of the authors was my advisor!)

[(myl) They use the same collection of transcripts as I did. Their goal is to use various features (including lexical bigrams as well as unigrams) to classify speakers as male or female. They observe that classification works much worse for mixed-sex conversations than for same-sex conversations, and (as a result of considering various train/test options) they conclude that "both genders equally alter their linguisic patterns to match the opposite gender". They note that as a result, "the gender of speaker B can be detected better than chance given only the transcript and gender of speaker A".

Since their classification algorithms depend on the aggregate effect of thousands of features, it would consistent with their results for the behavior of individual lexical items to be much more inconsistent. I find it interesting that by and large, individual words also behave in the way predicted by the idea that accommodation occurs.

You might expect that on the contrary, there would be some linguistic polarization. And it wouldn't surprise me to find that this can happen under some circumstances.]
L'Esprit de l'Escalier said,

December 30, 2011 @ 2:32 pm

Correction to my previous post: I inadvertently used common logarithms instead of natural logarithms. The corrected numbers are larger. I also switched the order of him and us.

Her, us, him: -66, -33, -30.

Them, me: -10, -9.

It, you: 2, 12.
Catanea said,

December 30, 2011 @ 5:29 pm

Jeremy Kahn (@trochee) says "Two folks" – I thought one could have "some folks"…but "two folks"? Two people, I suppose? Two individuals? Is "two folks" common somewhere? For me, it would have to mean something quite strange, like two separate folk cultures, and I can't imagine quite how…

[(myl) Seems normal to me.]
Jeremy Kahn (@trochee) said,

December 30, 2011 @ 6:08 pm

MYL: The polarization result that I remember from Costas' internal presentation was that it turned out that swear words were a strong marker for male-male conversation — that is, their presence was a strong indicator that this conversation was male-to-male, a result visible in what you've noted above. There were a few parallel indicators (kinship terms?) that indicated the conversation was likely female-female. You can see the variation of these M/M cues or F/F cues as polarization or their absence (in mixed-sex conversation) as accommodation; I don't think either of these interpretations is in conflict.

(@Cateana — I've said "two folks" to mean "two people" for most of my life, but I grew up in an American dialect gumbo [Atlanta].)

[(myl) We're using "polarization" in different ways, I think. What I meant by polarization was the opposite of accommodation: there may be circumstances in which Group-X- or Group-Y-associated words become MORE rather than LESS group-associated in cross-group as opposed to within-group conversations. This might in fact happen where the groups are males vs. females, in some cultures or in some contexts.]
Dakota said,

December 30, 2011 @ 6:39 pm

I once picked up some linguistics text at someone's house and leafed through it – don't remember the name of the book – but it stated unequivocally that swearing was based on social class. It asserted that lower class women will swear, but not middle class women, and that lower class men, but not middle class men, will swear in front of women.

And "sex" not "gender" accommodation? Sounds like something that happens naked.

[(myl) To the extent that gender (socially constructed) is different from sex (biologically defined), we don't know anything about the gender of the speakers in the collection that I used.]
Nick Lamb said,

December 30, 2011 @ 10:12 pm

That last comment by Mark is intriguing. I'd have expected that we didn't know anything about the _sex_ of the speakers, since in this setting it seemed most likely that the information was collected by self-assessment (ie asking the volunteers to tick a box on a form) which has the effect of collecting gender information even if you write "Sex" on the form. Is that not so? How then was the information determined?
Janice Byer said,

December 30, 2011 @ 11:40 pm

"The most striking discovery is that women, not men, were the more prolific users of first person singular pronouns…" Newman et al, p. 232.

On p. 214, the authors chivalrously suggest explanations that serve to rescue us women from the presumed diss of their discovery, even as men are left with cause to cuss out their %!$& insinuation thoroughly debunked by Language Log.
Barbara Phillips Long said,

December 31, 2011 @ 2:34 am

I was amused to see "actor" in the grids. I read so many police reports that the term is ambiguous to me — actors like to run away from the cops, for instance.

I often imagine headlines such as "Actor sought in peeping tom incident," which sounds like great National Enquirer fodder. Much more fun than "Suspect sought…" or "Man sought…"
Just another Peter said,

January 2, 2012 @ 6:53 pm

The first lot of data doesn't surprise me at all. Taking "engineering" as an example, men raise the topic more frequently with other men due to an assumed shared interest. Even though it's raised less frequently with a mixed-sex conversation, once it's raised it will be discussed thereby bringing the proportions closer to parity. Similarly, once cussing is started both parties are likely to participate.

IANAL
Evan said,

January 3, 2012 @ 2:14 pm

at first I thought the log ratios were incorrect. For example, mixed-sex 'fuck' looks like it should be 0. Then I realized this was due to rounding. I think that when you are reporting word frequencies per million it would make more sense to keep a consistent number of sig figs instead of a consistent number of places after the decimal because rounding makes a huge difference in these smaller cases (e.g. the log-ratio of mixed-sex 'theirs' is -20 before rounding but -11 after)

RSS feed for comments on this post

Sexual accommodation

13 Comments

Andy Averill said,

Jennifer Pardo said,

L'Esprit de l'Escalier said,

Jeremy Kahn (@trochee) said,

L'Esprit de l'Escalier said,

Catanea said,

Jeremy Kahn (@trochee) said,

Dakota said,

Nick Lamb said,

Janice Byer said,

Barbara Phillips Long said,

Just another Peter said,

Evan said,

Follow us on Twitter

Archives [+/–]

Blogroll [+/–]

Meta