Swearing and social networks

« previous post | next post »

Swearing is risky behavior. Many of its implications are out of the speaker's control. Thus, it is advisable to know your audience well before, say, dropping the F-bomb. I think this is basically true in any setting, and I expect it to be even more powerfully felt in situations where swearing is highly transgressive.

The Enron email dataset provides a nice chance to test out these claims. It is large (about 250,000 distinct messages, sent and received by over 11,000 distinct email addresses), and it contains a moderate amount of bad language. Not everyone swears, but a fair number of people do. The topics range widely: fantasy football, faith, energy markets, vacation time (and of course bankruptcy and the FERC). So, with some qualifications that I'll get to, it is a useful testing ground for claims about swearing and risky verbal behavior. The following email network graph is my first stab at conducting such a test:

Swearing in an Enron email network

The nodes represent 99 people with Enron email addresses who had relatively high email traffic in the dataset: at least 50 messages sent to other people in this group of 99, and at least 50 messages received from other people in this group of 99. (Messages that included any outsiders in their "To" lists were excluded.)

  • A red arrow from node A to node B means that user A swore in a message to user B at least once.
  • The thickness of the arrow's line represents the amount of traffic from A to B: a thick arrow from A to B means that A sent at least 20 messages to B, and a thin arrow means A sent between 1 and 19 messages (inclusive) to B.

The different line thicknesses might be hard to see at first, because the vast majority of the lines in this network are thick. I claim that this is no accident. It reflects the fact that, in this corporate setting, swearing is risky enough that it is best done only with people you know well. Your first few messages to someone are unlikely to contain swears, but you might build up the courage over time.

I can quantify the visual impression that these arrows are mostly thick: just 1.6% of the possible from–to pairs in this sample set have message counts of 20 or more, whereas 78% of the from–to swearing-pairs have message counts of 20 or more. If you squint, you can see this contrast reflected in the following version of the network, in which a gray arrow from A to B means that A swore at someone or other in the sample but sent only swear-free messages to B in this dataset. (Note: This is an update/improvement; the previous visualization included arrows for nonswearers as well, which resulted in a mass of gray in the middle of the network. My thanks to Dougal Stanton for the suggestion in the comments.)

Swearing in an Enron email network

There is one effect that I expected to observe but did not. Because swearing is risky, the safest situation in which to swear is one in which your hearer has already sworn with you. Thus, I expected most of the red arrows to form symmetric pairs. (See also this post on Jamie Pennebaker's work.) In fact, very few red arrows run in both directions in this sample. I suspect that this is due to a major drawback (for my purposes) to the dataset: many of these relationships are hierarchical. It's one thing if Skilling calls you or someone else an asshole, and quite another to use that as an invitation to do some swearing yourself.

In closing, thanks are in order: to the people behind statnet, the amazing R library that let me build the above networks using just simple matrices of counts, and to all the people who worked to tame the wild Enron dataset, especially Andrés Corrada-Emmanuel for his tools for identifying users and removing repeat messages.


  1. Dougal Stanton said,

    December 19, 2008 @ 2:36 pm

    To reduce the noise in your second graph you could ignore the people who didn't swear. That is, if A swore at B, draw a red arrow from A to B. But if A didn't swear at C, then draw a grey arrow. If C never swore at anyone one it's pointless to include any arrows from them, so their grey arrows can be safely omitted.

    In short, only draw arrows from swearers. I hope this makes sense, as it's a Friday night and I've already started on the Christmas cocktails. Slainge!

  2. jfruh said,

    December 19, 2008 @ 3:14 pm

    As a side linguistic note, I see that in the caption you use swear as a noun ("a swear"). Is this a regional dialect? I remember hearing this a lot when I was growing up in Western New York, but not much since I left.

  3. rootlesscosmo said,

    December 19, 2008 @ 4:09 pm

    a major drawback (for my purposes) to the dataset: many of these relationships are hierarchical

    Is there a way to identify the gender of senders and recipients?

  4. Chris Potts said,

    December 19, 2008 @ 5:43 pm


    Is there a way to identify the gender of senders and recipients?

    This is hard to do corpus-wide — I've not seen general resources that would provide such information — but we can do it for the small set of swear-pairs in my sample, just using our intuitions about naming conventions. Here's are my counts:

    • MM: 18
    • MF: 6
    • FM: 8
    • FF: 9

    Note: Enron seems to have had many more male than female employees, so it is not clear how useful these counts are on their own.

    And here's the full table of swearers, with just the first names left in. (I used only addresses of the form "first.last@enron.com", so it was easy to get this information.)

    From To Genders
    1 mike michelle MF
    2 michelle david FM
    3 mary susan FF
    4 matthew eric MM
    5 susan steven FM
    6 leslie tana FF
    7 tana marie FF
    8 jeffrey mike MM
    9 phillip eric MM
    10 amanda mike FM
    11 michelle mike FM
    12 amy cara FF
    13 sally greg FM
    14 steven maureen MF
    15 john greg MM
    16 john david MM
    17 eric matthew MM
    18 eric phillip MM
    19 john jeffrey MM
    20 john john MM
    21 karen jeff FM
    22 frank jim MM
    23 mike karen MF
    24 james jeff MM
    25 joe stuart MM
    26 jeff steven MM
    27 jeff karen MF
    28 jeff paul MM
    29 jeff susan MF
    30 jeff richard MM
    31 jeff james MM
    32 john alexandra MF
    33 kay jeffrey FM
    34 kay reagan FF
    35 kay suzanne FF
    36 kay ben FM
    37 kay kathleen FF
    38 suzanne kay FF
    39 barry mike MM
    40 louise bob FM
    41 david john MM

  5. Chris Potts said,

    December 19, 2008 @ 5:45 pm


    Yes, "swear" is easily a noun for me. I grew up in southern CT — a part that is connected by train and in spirit to midtown Manhattan.

  6. Dez said,

    December 19, 2008 @ 5:56 pm

    I'm not clear on what you count as 'swearing'. i presume that 'fuck' is included, but what else?

  7. Chris Potts said,

    December 19, 2008 @ 6:32 pm

    Dougal Stanton!

    To reduce the noise in your second graph you could ignore the people who didn't swear. That is, if A swore at B, draw a red arrow from A to B. But if A didn't swear at C, then draw a grey arrow.

    Many thanks for this. This new visualization was pretty easy to generate, and I think it is an improvement. I updated the post and image, with a hat-tip to you. You've earned the cocktails as far as I am concerned!

  8. Bryn LaFollette said,

    December 19, 2008 @ 7:19 pm

    That's pretty neat work! I've been working with the Enron Data set for quite a while now, but never looked at its content from this perspective. Primarily we've been using if for developing concept similarity and near duplicate relationships between whole messages. I'm suddenly very interested to see what other sorts of sociological details could be teased out of examining the social-network graph. Thanks, Chris!

  9. Arnold Zwicky said,

    December 19, 2008 @ 7:59 pm

    jfruh: "As a side linguistic note, I see that in the caption you use swear as a noun ("a swear")."

    AZ, 8/19/08: Horton Hears a Swear:

  10. Joseph Frazee said,

    December 21, 2008 @ 12:18 pm


    You say that "Swearing is risky behavior" and "This is basically true in any setting." Though you do allow for some variability in audience, I have some first-hand experience in a marketing and insurance company where swearing was more or less a way of showing people that you cared about the company's outcomes.

    It even seemed to be a practice primarily of those that wanted to advance in the organization. You swore to assert a rank you didn't have and over time you earned that rank. Swearing was rampant among the senior executives and so the more you swore in interactions with them, the more you identified with them.

    This suggests something like transgressing the norm of swearing is risky. I have a nice cutout from an (unscientific) WSJ (I have to confess that I like reading it) article about this pattern — high swearing and low swearing organizations — but I can't find it electronically.

  11. Philip (flip) Kromer said,

    December 23, 2008 @ 6:31 am

    We're close to releasing a scrape of the giant component of Twitter's friend graph, including a moderate number (~30M) of short messages. It might be amenable to the same investigation… The data will include prestige (pagerank) centrality, 2-neighborhood info, and the implicit @reply network. If you'd like to get hold of our rough draft, contact me: flip at infochimps.org

RSS feed for comments on this post