Sex, age, and pronouns on Facebook

« previous post | next post »

Andy Schwartz and others at the World Well-Being Project have worked with "Facebook posts from over 75,000 volunteers who also took the standard Interpersonal Personality Item Pool (IPIP) personality test to measure the 'Big Five' personality traits", looking for linguistic features that correlate with those aspects of personality measured by that test.

Lyle Ungar talked about this work a few days ago (Andy was unfortunately out of town), for an audience of mostly first-year undergraduates. The venue was a weekly event, Dinners With Interesting People, held in the Quad, an undergraduate residence here at Penn.

This year, the DWIP talks (though still open to the public) are integrated into a Freshman Seminar called "The Landscape of Research and Innovation at Penn". The idea is to give the participants a general idea of what kinds of research go on around here, and how they might get involved. As part of the course, I've asked DWIP guests to provide a dataset that we can use as part of a course assignment in quantitative analysis.  Since the students have widely varied backgrounds in mathematics, statistics, and programming, and since the quantitative analysis part of the course is only one of several aspects, the assignments start with an R script that does something interesting, with the assigned task being to modify the script to do something a bit different.

In this case, Andy was kind enough to give me a table indicating number of posts and token counts for each "word", in their Facebook dataset, for males and females of each age.  Inspired by Jamie Pennebaker's The Secret Life of Pronouns,  I decided to focus the quantitative analysis assignment around the issue of pronoun usage. The body of this post lays out some of the things that I've noticed in setting the assignment up.

I first looked at overall word counts by sex and age:

The bump at age=42 is due to default age assignment of 40 at the start of the study, so all age=42 data is thrown out in what follows.

For the remaining ages, here are the frequencies by sex and age for (the sum of) "I", "me", "my", "mine", where frequencies for female writers are the red Fs, while male writers' frequencies are the blue Ms:

The overall frequency of first-person singular pronoun usage decreases with age, and at every age, female writers use FPS pronouns more than male writers. The blip at age 14 is due to small sample size — probably the age=14 data should be removed as well, though I haven't done that in today's plots. And the increasingly noisy data at more advanced ages is clearly also due to small sample sizes — it would no doubt be helpful to do some smoothing, which again I haven't done for today's exercise.

Here are the frequencies for "you", "your", "yours":

Here it seems that female and male usage declines, in sync and at comparable values, to the age of 30, and then rises, with females increasingly outstripping males.

Here are the frequencies for "we", "us", "our", "ours":


Here female and male Facebookers are just about the same, reaching a wee we-peak in the late teens, and then rising steadily for the rest of the life cycle.

Here are the frequencies for "she", "her", "hers":

Female Facebookers refer to other females at rates that increase roughly to the age of 40. At all ages, male Facebookers refer to females at much lower rates.

Here are the frequencies for "he", "him", "his":

Ignoring the blip for the age 14 data, references to males by both male and female posters increase in frequency to about the age of 40, with males slightly ahead of females through age 30 or so.

Here are the frequencies for "it", "its":

Not much going on with "it", except for a slight male advantage in the late teens and early twenties, and the usual noise among the elderly due to small sample sizes.

Here are the frequencies for "they", "them", "their", "theirs":

We see steady growth in third-person plural frequency with increasing age, especially steeply during the period from about age 25 to age 45.

Reprising some of the same numbers in different combinations, we see that male writers show increasing frequency of references to other males with increasing age, at least to age 40 or so, but more-or-less steady low rates of pronominal reference to individual females at all ages:

The comparable plot for female writers shows a very different pattern — the rate of pronominal reference to both males and females also increases through age 40 or so, but at all ages, the rates for pronominal references to males and females are fairly close. Before age 30, female references are slightly more frequent, and after 40, male references are a bit commoner. But overall, the referential egalitarianism is in striking contrast to the pattern for male writers:

One note: It would be a boon to pronominology if the folks at Facebook would release similar data based on a larger sample of their ~1.3 billion users…



  1. Ben Zimmer said,

    September 19, 2014 @ 9:00 am

    One minor point: some of the third-person pronoun usage may actually be first-person reference in disguise. As Gretchen McCulloch explained on her All Things Linguistic blog, it used to be conventional to refer to oneself in the third person in FB status updates. That illeist convention has largely fallen by the wayside, but it could be hanging on among some (older?) users.

  2. Brett said,

    September 19, 2014 @ 11:48 am

    @Ben Zimmer: As I recall, Facebook originally forced you to start status updates with your name. That forced you to either use the third person or to write something ungrammatical Then they changed it to default to starting with your name, but you could delete it when you want. In those days, before the news feed, they were really supposed to be status updates, telling everyone what you were doing at that time. The name "status update" is an artifact now, because of the way the posts cycle down the news feed in the current format.

  3. Pflaumbaum said,

    September 19, 2014 @ 8:57 pm

    Is it just greedy me who's now craving results for pronoun case in co-ordination?

    But I guess it would be pretty hard to automate in a way that told you whether it's in subject or object position… could any programme do so with my first sentence, for instance?

  4. KevinR said,

    September 20, 2014 @ 5:26 pm

    Any chance of sorting out singular they? I sometimes feel like there is a gender-correlated difference in the use of she/they as a substitute for a 'he' of indeterminate gender in spoken/emailed language.

  5. KevinR said,

    September 20, 2014 @ 6:34 pm

    (More particularly, it seems like I typically hear women saying "he", "she", sometimes "they", or something clumsy for indeterminate gender, while I hear men saying "he", "they", or something clumsy (he/she), but almost never "she".)

  6. Yuval said,

    September 25, 2014 @ 4:34 am

    Will adding the reflexive pronouns have any effect at all on the trends?

    [(myl) I doubt it, because the frequencies of the reflexive pronouns are all pretty low compared to their non-reflexive counterparts. Thus "I" occurs 7.5 million times, while "myself" occurs 0.1 million times, about 59 times less often than "I"; etc.]

  7. I certainly hope so! | Here is Good said,

    October 8, 2014 @ 4:35 pm

    […] Mark Liberman writes at Language Log, researchers at the World Well-Being Project at the University of Pennsylvania analyzed the […]

RSS feed for comments on this post