Compound pejoratives

« previous post | next post »

[This has been drifting down my too-long to-blog list for almost 16 months — but better late than never, I guess, and the world could use some pejorative-flavored humor…] 

Colin Morris, "Compound pejoratives on Reddit – from buttface to wankpuffin", 6/28/2022:

I collected lists of around 70 prefixes and 70 suffixes (collectively, “affixes”) that can be flexibly combined to form insulting compounds, based on a scan of Wiktionary’s English derogatory terms category. The terms covered a wide range of domains, including:

    • scatology (fart-poop-)
    • political epithets (lib-Trump-)
    • food (-waffle-burger)
    • body parts (butt--face-head-brains)
    • gendered epithets (bitch--boy)
    • animals (dog--monkey)

Most terms were limited to appearing in one position. For example, while -face readily forms pejorative compounds as a suffix, it fails to produce felicitous compounds as a prefix (facewadfaceclownfacefart?).

Taking the product of these lists gives around 4,800 possible A+B combinations. Most are of a pejorative character, though some false positives slipped in (e.g. dogpilespitballs). I scraped all Reddit comments from 2006 to the end of 2020, and counted the number of comments containing each.

Among other results, the graphical "Matrix of Pejoration" stands out:

And there's a github repo including (quoted from the

    • counts.csv, a dataset mapping ~4,800 compound pejoratives to the number of Reddit comments containing that compound. (wikt.csv has the same format with a column recording whether the term has an entry in Wiktionary)
    • various Python scripts used to generate this dataset. This process is documented in the "Data pipeline" section below, though you probably don't need to run these unless you want to expand the dataset with a different set of affixes or time range.
    • various IPython notebooks exploring and visualizing the dataset (See "Guide to IPython notebooks" section below)

Colin Morris's blog post notes that the rank-frequency distribution of the compound terms follows Zipf's law:

Colin Morris offers some additional descriptive insight about productivity (click images to embiggen):

But the principles governing the patterns of morph-combination — in the graphical matrix or the full dataset — are not so clear.

Exercise for the reader: What combination of morphology, syntax, semantics, analogy, and historico-cultural accident is responsible for which aspects of the observed distributions? How is this similar to or different from the patterns in other English complex nominals?



  1. Jonathan Smith said,

    October 17, 2023 @ 8:01 am

    Excellent… The "Matrix of Pejoration" is suggesting there are on the order of 1000 occurrences of "assass" and "shitshit" in the data though? Morphosyntax puzzling here…

  2. Mark Liberman said,

    October 17, 2023 @ 9:30 am

    @Jonathan Smith:
    The raw counts file does have


    This is not a bug in the (raw) collection protocol, or a bunch of typos in reddit. In the case of "assass" it's a non-pejorative derivation, (mostly) short for "assassin" in "Assassin's Creed".

    I'll let you figure out for yourself where "shitshit" comes from…

    False alarms of this kind are a recurrent problem in corpus-based analysis.

  3. Ross Presser said,

    October 17, 2023 @ 10:26 am

    Jonathan Smith said,

    > Excellent… The "Matrix of Pejoration" is suggesting there are on the order of 1000 occurrences of "assass" and "shitshit" in the data though? Morphosyntax puzzling here…

    Yes, there are. Lines from count.csv matching the regex /^([a-z]+),\1,/

  4. Gregory Kusnick said,

    October 17, 2023 @ 10:34 am

    To my eye, "Trumpnozzle" stands out as less utilized than it plausibly could be.

  5. Y said,

    October 17, 2023 @ 11:15 am

    Among other things, dirttard is not very euphonious.

  6. Yuval said,

    October 17, 2023 @ 1:42 pm

    The scraping script will probably not work anymore, Reddit has shut down API queries a few months ago.

  7. Adam C said,

    October 17, 2023 @ 3:15 pm

    Probably related to the -dan curses in Mandarin.
    Using Reddit will skew the results a bit because Reddit particularly values originality and has its own lexicon. The link treats these initial uses as coinages, but I've never seen 'shitlord' anywhere else. Likewise would contain an inordinate number of 'asshats'.
    How about YouTube?

  8. Mark Liberman said,

    October 17, 2023 @ 4:07 pm

    @Adam C: …but I've never seen 'shitlord' anywhere else…

    Google says different

  9. J.W. Brewer said,

    October 17, 2023 @ 7:08 pm

    If you look at the 20 x 20 matrix, some of the most common ones like "dumbass" or "scumbag" or "shithead" are also ones I can recall from my own youth (long before "reddit" existed and long before even a science fiction author could have cogently explained what "reddit" even referred to) and were thus already present in the lexicon before the more recent mysterious lexical-coinage forces (internet-amplified?) that seem to account for many of the others had manifested themselves.

    The 20 x 20 matrix also includes at least one word ("dipstick") that is neutral-to-positive in its core automobile-engine-related sense. It apparently also has pejorative senses, which I think are best understood by thinking of it as a minced alternative to "dipshit," although I cannot imagine anyone calling anyone else a "dipstick" to their face without picturing Don RIckles mock-insulting some fellow guest on a 1970's talk show. "Douchebag" (although usually given as "douche bag" or "douche-bag") started as a benign/neutral word found in medical journals in the early 20th century, but the pejorative metaphorical sense had largely crowded out the neutral literal sense by the time I first heard.

  10. J.W. Brewer said,

    October 17, 2023 @ 7:25 pm

    I don't mind being the age I am and having the life I've had, but the ready availability of cool datasets like this does make me feel like my undergraduate linguistics classes back in the mid-Eighties were impoverished by the lack of anything comparable. We did in the intro sociolinguistics class I took (offered in the Anthropology dep't rather than the Linguistics dep't, which may or may not have been meaningful) get introduced to the notion that there were a few scholars out there who were interested in empirical data collection and statistical analysis thereof rather than just sitting in their armchairs at MIT and making up example sentences to introspect about the grammaticality of. But imagine how ridiculously many man-hours of unpaid graduate-student labor it took to construct Labov's famous "variation in the pronunciation of 'fourth floor'" dataset given the Stone Age technology of the time.

  11. J.W. Brewer said,

    October 17, 2023 @ 7:38 pm

    (I feel like out of deference to our host's family connection I should mention that my alma mater's Linguistics Dep't did have an affiliation in those days with Haskins Labs, whose president at the time was myl's late father, also a probably-emeritus-by-then professor in the department. I'm sure there was work being done there that involved empirical data collection and statistical analysis thereof. I was not at the time particularly interested in the subfields of linguistics that were Haskins-Labs-adjacent, so I didn't take any of those classes. Which may have been my loss.)

  12. TonyG said,

    October 17, 2023 @ 7:56 pm

    For me the most surprising thing was Colin Morris' admission that he doesn't know what fuckmitten means. I've never seen it before, but it surely could only mean what we used to call a wanksock? And indeed the Urban Dictionary confirms this.

  13. Steve Morrison said,

    October 17, 2023 @ 8:08 pm

    I remember one time, years ago, when someone misunderstood the word toolbag as an insult.

  14. Stephen Goranson said,

    October 18, 2023 @ 8:50 am

    As Saul Lieberman said, introducing a lecture by Gershom Scholem,
    "Nonsense is nonsense, but the study* of nonsense is scholarship."

    *or did he say "history"?

  15. Ralph J Hickok said,

    October 18, 2023 @ 9:00 am

    "dickpuffin" seems very weak, but "puffindick" might be an effective insult, although I don't know enough about puffin anatomy to be sure of that.

  16. Roscoe said,

    October 18, 2023 @ 11:59 am

    I remain secure as ever in my belief that “nimrod” became an insult, not only because Bugs Bunny used it sarcastically in a cartoon, but also because it sounds like it should be a compound pejorative.

  17. David Morris said,

    October 18, 2023 @ 2:31 pm

    In other parts of the community there are similar words like hornbag, horndog and studmuffin.

  18. Peter Taylor said,

    October 18, 2023 @ 2:39 pm

    @Ralph J Hickok, the vast majority of species of birds don't have penises at all. The most notable feature of puffin anatomy is the large and brightly coloured beak (at least during mating season). Prosody seems as good an explanation as anatomy for the choice of puffins, but as to why puffins instead of e.g. sparrows, perhaps a perception of obscurity? Or perhaps the near-homophony with puffing is intended to imply obesity?

  19. Haamu said,

    October 18, 2023 @ 4:13 pm

    Looking at the heat map and reflecting on how it might look extended it to 70×70, I'm struck by the thought that all the apparent variety may be a bit illusory.

    It isn't just that this provides an algorithm for "coining" a new epithet by looking for a near-white cell and then splicing the row and column headings together — it's the formulaic nature of the affixes themselves, or at least of the resulting words. So many of them, particularly the redder ones, seem to be going for the same thing. Hmmm… can't put my finger on it. (I'm not looking for these to get cleaner, necessarily, just more creative.)

    I think the Elizabethans had a lot more imagination.

    And if you disagree, I will call thee a hedgeborn cankerblossom. Or a mumblecrust, clapperclaw, muttonmonger, mooncalf, flapdragon, something like that.

    Maybe candlewaster!

  20. Dan Scherlis said,

    October 18, 2023 @ 9:53 pm

    If I may contribute two more instances to the collection:

    (1) Kurt Vonnegut has a character use the brilliant pejorative "sparrowfart", in his play, Happy Birthday, Wanda June.

    I heard this in an early production, and never forgot it. The clear implication (especially in context): "You'd be unpleasant, if you had any impact at all."

    (2) Many years ago, after making a potentially disastrous mistake, I was called a "neanderfuck". I was too impressed, charmed even, to register the insult.

  21. Buku Mimpi 2D Bergambar said,

    October 19, 2023 @ 5:55 am

    One thing I wish this went into more is the power of very specific users, memes, or events to propel certain compounds to new heights

  22. JPL said,

    October 19, 2023 @ 6:50 pm

    OP (Morris post):
    "…while 'face' readily forms pejorative compounds as a suffix, it fails to produce felicitous compounds as a prefix …"

    This may be true for the cases of pejorative compounds, but it's not true in general, since we have, e.g., 'faceplant', 'facepalm', as well as the the non-slang 'facelift' and 'facemask'.

    "I collected lists of around 70 prefixes and 70 suffixes (collectively "affixes") that can be flexibly combined to form insulting compounds …"

    Mr Morris is apparently not a linguist, but if he wants to go forward with his analysis, I would suggest that he not use the terms 'prefix', 'suffix' and 'affix' to describe these phenomena. I don't claim to be an authority (I'm not a morphologist, and please correct me if I am off the mark), but these terms I think normally refer to bound morphemes, not free morphemes, and are described as being attached to a stem, belonging to a contentive lexical category (noun, verb, etc.) to form a derived lexeme or an inflected word-form. What you're describing are examples of the morphological process of compounding, which involves combining free forms, based on the way they would be combined in a syntactic construction, resulting in a derived compound lexeme. (The syntactic construction in question being that of nominal modification, specifically involving (two) "noun" elements and indicating a subcategory (the "modifier") of the category indicated by the "head" ("innermost" (or "rightmost") element), this element being the one most relevant for the relation of reference.)

  23. J.W. Brewer said,

    October 20, 2023 @ 11:11 am

    I just happened to see the seemingly-similar "dumpweed" in a headline of a story about the reunion or return of some late Nineties rock band I never paid much attention to. Not a word I recognized, and it seemed offhand to be generated by the same processes, especially if you think of the mildly scatological sense of "dump." It turns out it's an old song title from the band, and although the title compound noun does not appear in the actual lyrics, it seems to be glossed per urban dictionary as "someone who has recently been dumped, that cannot get over their past relationship." So maybe mildly pejorative, but in a "more to be pitied than censured" way. Too bad, because I thought it had possibilities as a more generic term of abuse. -weed doesn't make Morris' 20×20 matrix, but he does have some -weed examples in his fuller data. Obviously "dickweed" (which is apparently now in the OED), but also others that seemed more like recent nonce coinages.

  24. Philip Taylor said,

    October 20, 2023 @ 3:58 pm

    (For those interested in such things)

    dickweed (noun)
    slang (originally and chiefly U.S.).

    An obnoxious, detestable, or stupid person (esp. a male). 1984–

    1984 [Campus slang.] Dickweed.
    J. Algeo in J. E. Lighter, Historical Dictionary of American Slang (1994) vol. I. 586/2

    1986 You killed Ted, you Medieval dick-weed!
    C. Matheson & E. Solomon, Bill & Ted's Excellent Adventure (film script, 5th draft) (O.E.D. Archive) 52

    1992 It would be a pleasure to wake that dickweed up early.
    O. Goldsmith, First Wives Club i. i. 23

    2001 Come on, you dickweed.
    S. King, Dreamcatcher vi. 195

RSS feed for comments on this post