Proportion of adjectives and adverbs: Some facts

« previous post | next post »

Adam Okulicz-Kozaryn, "Cluttered writing: adjectives and adverbs in academia", Scientometrics 2013:

[H]ow do we produce readable and clean scientific writing? One of the good elements of style is to avoid adverbs and adjectives (Zinsser 2006). Adjectives and adverbs sprinkle paper with unnecessary clutter. This clutter does not convey information but distracts and has no point especially in academic writing, say, as opposed to literary prose or poetry.

If you've seen my earlier discussion of this paper ("'Clutter' in (writing about) science writing", 8/30/2013), you'll recall that Dr. O-K goes on to count adjectives and adverbs in some word lists from samples of scientific writing. He asserts that "social science" writing uses about 15% more adjectives and adverbs than "natural science" writing — although he doesn't tell us enough about his methods to dispel concerns about several likely sources of artifact — and he concludes by asking "Is there a reason that a social scientist cannot write as clearly as a natural scientist?"

In the interests of science of all kinds, I decided to devote this morning's Breakfast Experiment™ to the relations between text quality and the proportion of adjectives and adverbs. I wrote a python script using NLTK to calculate the proportions of various parts of speech in a document; and then I tried this script out on samples of various sorts of writing. Here's some of what I found.

To start with, I decided to try some really cluttered prose, prose that is not at all "readable and clean": Edward Bulwer-Lytton's Paul Clifford. Wikipedia tells us that this novel is considered to represent 'the archetypal example of a florid, melodramatic style of fiction writing'". Its first sentence:

It was a dark and stormy night; the rain fell in torrents, except at occasional intervals, when it was checked by a violent gust of wind which swept up the streets (for it is in London that our scene lies), rattling along the house-tops, and fiercely agitating the scanty flame of the lamps that struggled against the darkness.

I put essentially all of the first chapter of this work into a file (minus the paragraphs that are mostly dialogue, much of which is in dialect). According to NLTK's pos_tag() function, which should be about 95% correct, the score was:

1775 words, 184 punctuation tokens = 1591 real words
108 adjectives = 6.8 percent
78 adverbs = 4.9 percent
186 adjectives+adverbs = 11.7 percent

So Bulwer-Lytton's chapter is about 12% adjectives and adverbs. What should we compare this to? Well, Dr. O-K cites William Zinsser's On Writing Well as his authority for the cluttering nature of adjectives and adverbs, so let's try the first three sections of that work (minus quotations from others, of course):

3939 words, 439 punctuation tokens = 3500 real words
241 adjectives = 6.9 percent
208 adverbs = 5.9 percent
449 adjectives+adverbs = 12.8 percent

Hmm. Well, maybe this is experimental error. And Bulwer-Lytton's writing is clear enough, it's just kind of overwrought. So let's take a look a something by Jacques Derrida, whose prose is about as unreadable as anything I've ever encountered. Here's the score for chapter 2 of "Of Grammatology" (in English translation, of course):

19239 words, 2105 punctuation tokens = 17134 real words
1434 adjectives = 8.4 percent
946 adverbs = 5.5 percent
2380 adjectives+adverbs = 13.9 percent

OK, that's better —  Derrida has 19% more adjectives and adverbs than Bulwer-Lytton. But he's only got 8% more than Zinsser, and Zinsser has more than Bulwer-Lytton, so this still doesn't all seem to be working out the way we were told it would.

Let's go for another paragon. Dr. O-K opens his paper with a quote from Mark Twain: "When you catch an adjective, kill it." So let's try the whole letter that the quote came from:

1474 words, 170 punctuation tokens = 1304 real words
89 adjectives = 6.8 percent
95 adverbs = 7.3 percent
184 adjectives+adverbs = 14.1 percent

Oops. We're really going in the wrong direction here — Saint Mark uses the highest proportion of adjectives and adverbs that we've seen so far.

And what about Dr. O-K's own writing? Here's the score for the text of "Cluttered writing: adjectives and adverbs in academia" itself (of course minus the quotations from others):

883 words, 80 punctuation tokens = 803 real words
85 adjectives = 10.6 percent
42 adverbs = 5.2 percent
127 adjectives+adverbs = 15.8 percent

We have a winner! Dr. Okulicz-Kozaryn's text, about the importance of eliminating adjectives and adverbs from prose, has fully 35% more adjectives and adverbs than the infamous "It was a dark and stormy night" passage, which has given its author's name to an annual bad writing contest!

(127/803)/(186/1591) = 1.3528

And the first two pages of another of his papers ("Man and God and Circle of Trust", 2012) score even a bit higher:

1121 words, 104 punctuation tokens = 1017 real words
113 adjectives = 11.1 percent
60 adverbs = 5.9 percent
173 adjectives+adverbs = 17 percent

Seriously, the problem is not in Dr. O-K's writing (despite the sprinkling of slavicisms), but in his ideas. Calculating the relative percentages of adjectives and adverbs in texts tells us nothing useful about their readability, clarity, or efficiency.

I'll spare you the reports for the other 45 texts that's I've tested. But just to let Dr. O-K off the hook for the "most modifiers" prize, let me note that the text of Ben Yagoda's piece from the Chronicle of Higher Education on adjectival anxiety ("The Adjective — So Ludic, So Minatory, So Twee", 2/20/2004), beats him out:

1908 words, 301 punctuation tokens = 1607 real words
208 adjectives = 12.9 percent
86 adverbs = 5.4 percent
294 adjectives+adverbs = 18.3 percent

Finally, I need to point out that there's a technical flaw in the whole "avoid adjectives and adverbs" idea — nouns are often modified by other nouns, or by prepositional phrases, or in other ways that don't involve adjectives; and verbs are often modified by prepositional phrases, subordinate clauses used as verbal adjuncts, and so on.

If it were true, counterfactually, that modification in general was a Bad Thing, then we'd need to count these other sorts of modifiers as well, not just adjectives and adverbs.

Some of the previous LL posts on modificational anxiety:

"Those who take the adjectives from the table", 2/18/2004
"Avoiding rape and adverbs", 2/25/2004
"Modification as social anxiety", 5/16/2004
"The evolution of disornamentation", 2/21/2005
"Adjectives banned in Baltimore", 3/5/2007
"Automated adverb hunting and why you don't need it", 3/5/2007
"Worthless grammar edicts from Harvard", 4/29/2010
"Getting rid of adverbs and other adjuncts", 2/21/2013
"'Clutter' in (writing about) science writing", 8/30/2013

N.B. Someone who took this whole business seriously enough to want to look at differences in part-of-speech distributions among scientific disciplines should know that Okulicz-Kozaryn is wrong when he writes that

as of 2012 I cannot bulk download enough full texts to have a representative sample of a discipline.

Between arXiv,  the PLoS collections, SSOAR, the resources available from the ACL, and so on, it would not be hard to create large enough samples in enough different disciplines and subdisciplines to engage the question more seriously than Okulicz-Kozaryn did. But you ought to have another hypothesis to test as well, in my opinion, because the modifier-percentage idea looks like a loser.

Update — I realize that it's only fair for me to report the score for this blog post. Leaving out the quotations and so on, and without this update, I get:

1143 words, 146 punctuation tokens = 997 real words
82 adjectives = 8.2 percent
59 adverbs = 5.9 percent
141 adjectives+adverbs = 14.1 percent

The same overall percentage as Mark Twain…

Update #2 — William Zinsser complains that

Clutter is the disease of American writing. We are a society strangling in unnecessary words, circular constructions, pompous frills and meaningless jargon. Who can understand the clotted language of everyday American commerce: the memo, the corporation report, the business letter, the notice from the bank explaining its latest “simplified” statement?

So I decided to score Microsoft's 2012 Annual Report:

1499 words, 92 punctuation tokens = 1407 real words
117 adjectives = 8.3 percent
38 adverbs = 2.7 percent
155 adjectives+adverbs = 11 percent

There are certainly some unnecessary words and pompous frills in that report ("we delivered strong results, launched fantastic new products and services, and positioned Microsoft for an incredible future"), but the percentage of adjectives and adverbs is not a good measure of those characteristics.

Update #3 — I should also tell you that I did check the adjective and adverb proportions in various natural-science articles. For example, the first page of the first article in the current issue of Physical Review A (A. Rançon et al., "Quench dynamics in Bose-Einstein condensates in the presence of a bath: Theory and experiment") weighs in at

908 words, 62 punctuation tokens = 846 real words
103 adjectives = 12.2 percent
35 adverbs = 4.1 percent
138 adjectives+adverbs = 16.3 percent

And a combination of the first seven abstracts from the current issue of Science scores

1041 words, 81 punctuation tokens = 960 real words
106 adjectives = 11 percent
36 adverbs = 3.8 percent
142 adjectives+adverbs = 14.8 percent

This tends to confirm my suspicion that Okulicz-Kozaryn's result (15% lower proportion of adjectives and adverbs in natural science compared to social science text) probably results from one of the obvious sources of artifact, for example an inappropriate attempt to calculate part-of-speech percentages in text derived from passages like this one (from the second page of the Rançon et al. paper):

Since Okulicz-Kozaryn's counts came from word lists supplied by JSTOR, and he doesn't tell us which lists he used or how he processed them, and JSTOR doesn't tell us how they created the lists, we'll probably never know.



23 Comments

  1. Ray Girvan said,

    September 7, 2013 @ 6:57 pm

    > One of the good elements of style is to avoid adverbs and adjectives (Zinsser 2006).

    This is a pity; one of my idols has feet of clay. I've always rated Zinsser highly for his spirited defence of the acceptability of starting sentences with "But", as quoted in The Merriam-Webster Dictionary of English Usage.

  2. Tracy Hall said,

    September 7, 2013 @ 11:02 pm

    I agree that most such simple assertions are not only ill-founded but easily falsifiable. That being said, the premise gives me an excuse to dredge up a little exercise I wrote a few years ago:

    "Pay attention to which parts of speech shoulder the greatest semantic burden of your writing. In English, declare. We favor verbs to propel what must transpire. In French, eloquence takes root in the force, precision, and stability of nouns. In no language are you writing expressively or stylistically optimally, especially efficiency-wise, when doing so predominantly adverbially."

    [(myl) Cute. But the point at issue here is whether O-K's claims are correct: (1) that small differences in measured adjective+adverb proportions are relevant to readability; and (2) that writing in the natural sciences tends to use fewer adjectives and adverbs than writing in other fields. My conclusion is that the first claim is preposterous, and the second claim is at best unproven.

    Also, your exercise is going to run aground on the Peevers' thing about transpire. If you're going to appeal to that crowd's prejudices by writing clever justifications for their adverb-phobia, you need to avoid pushing their other buttons. Unless you meant "… to propel what must leak out", as part of a sly second-order satire?]

  3. David Morris said,

    September 8, 2013 @ 7:43 am

    How many adjectives can a writer string together? Possibly as many adjectives as exist, and then coin some more. Recently, as part of my reading for my masters dissertation, I happened on the following sentence, in which an 18th century English surgeon describes the Indigenous peoples of New South Wales (Australia) (mostly positively but somewhat patronisingly) in letter to his brother:
    'They appear to be an Active, Volatile, Unoffending, Happy, Merry, Funny, Laughing Good-natured, Nasty Dirty, Race of human Creatures as ever lived in a State of Savageness.'

  4. Martin J Ball said,

    September 8, 2013 @ 8:12 am

    The trouble with Tracy Hall's little piece is that it reads horribly!

  5. Jay Lake said,

    September 8, 2013 @ 8:32 am

    Speaking as a fiction author (mostly science fiction and fantasy), I find this entire discussion deeply hilarious. In the critical processes of my genre, often referred to as Clarion or Milford critique, we tend to talk about adverbs as if they were the font of all evil. Which is just as much a piece of unsubstantiated folk wisdom masquerading as objective advice as Dr. O-K's piece under discussion here.

    [(myl) For what it's worth, the first section of your recent novel Kalimpura (which I enjoyed, by the way), scores

    2051 words, 136 punctuation tokens = 1915 real words
    104 adjectives = 5.4 percent
    127 adverbs = 6.6 percent
    231 adjectives+adverbs = 12.1 percent

    ]

  6. Levantine said,

    September 8, 2013 @ 1:31 pm

    Martin J Ball, I think that's the point of her piece. Unless I'm mistaken, the final sentence ('In no language are you writing expressively or stylistically optimally, especially efficiency-wise, when doing so predominantly adverbially') is meant to be ungainly in order to prove the point that adverbs are bad (a point I don't agree with, by the way).

    [(myl) Noun pile sequence composition methods present style annoyance and comprehension barrier possibilities as well. And to strive to persevere in continuing to decide to select and deploy verbs will not lead others to appreciate and understand what you've chosen to try to write, either.

    It's possible that advice about avoiding adjectives and adverbs helps certain kinds of novice writers to improve their compositions, if only because it's a way of making them think about what they've written. I don't have any evidence about this either way. But I'm pretty sure that tallying up the proportions of adjectives and adverbs is published text is not a useful way to estimate readability.

    As far as I can see, normal English prose involves adjective+adverb proportions in the 10-20% range, with no useful correlation between those proportions and the quality or clarity or readability of the prose. This is one of many features that we could use to quantify stylistic differences, and it wouldn't be surprising to find that there are small differences in disciplinary averages (though those would surely be small relative to within-discipline variance). But I was skeptical in advance of the idea that modifier-word proportions are a useful measure of readability; and after a bit of empirical investigation, I'm even more skeptical.]

  7. Garrett Wollman said,

    September 8, 2013 @ 1:45 pm

    LL itself could be taken as one extended meta-analysis into the question "Is there any single piece of writing advice that stands up to objective scrutiny?"

  8. bulbul said,

    September 8, 2013 @ 3:11 pm

    "'In no language are you writing expressively or stylistically optimally, especially efficiency-wise, when doing so predominantly adverbially"
    Is it really ungainly? It strikes me as somewhat poetic.

  9. Levantine said,

    September 8, 2013 @ 3:16 pm

    bulbul, I guess it's rhymey, but whether or not that makes it poetic is another matter. (Off-topic, but is your username anything to do with the Persian word for nightingale?)

  10. Dogma versus Rules of Thumb » No Contest Communications said,

    September 8, 2013 @ 3:29 pm

    […] recent post called "Proportion of Adverbs and Adjectives: Some Facts" dismantles the still-too-common notion that good writing needs to avoid adjectives and […]

  11. Y said,

    September 8, 2013 @ 4:03 pm

    Orwell's loved/hated P&TEL gives a purposefully 'bad' paraphrasing of a 'good' text: Ecclesiastes,

    I returned and saw under the sun, that the race is not to the swift, nor the battle to the strong, neither yet bread to the wise, nor yet riches to men of understanding, nor yet favour to men of skill; but time and chance happeneth to them all.

    Redone as

    Objective considerations of contemporary phenomena compel the conclusion that success or failure in competitive activities exhibits no tendency to be commensurate with innate capacity, but that a considerable element of the unpredictable must invariably be taken into account.

    The original ('good') version has 49 words, 2 adverbs ('yet'), and no adjectives ('swift', 'strong', 'wise' are nouns). The ugly paraphrase has only 38 words (though longer ones), of which 5 are modifying adjectives, plus 1 adverb, a healthy 16% all in all.
    Not defend O-K's paper, but it's true that modifiers can be used as a tool for evil, in this case, to my mind, the dull rhythm of adjective-noun, adjective-noun. I would have added 'appreciable' before 'tendency' to make it even more painful.

    [(myl) This sort of parlor trick has nothing at all to do with parts of speech, or even with the syntactic function of modification (which is different). It would be easy to compose a similar transformation, say of one of Orwell's own justly famous quotations, by (among other things) replacing adjectives and adverbs with pompous nominal expressions. ]

  12. Eric P Smith said,

    September 8, 2013 @ 6:31 pm

    @Y: I'm not sure I agree your syntactic categories. Surely in the passage you quote, 'not', 'neither' and the 3 occurrences of 'nor' are adverbs, and 'swift', 'strong' and 'wise' remain adjectives? We could modify 'swift' with an adverb like 'very': we couldn't do that with a noun.

  13. Y said,

    September 8, 2013 @ 7:10 pm

    @myl, I completely agree that "pompous nominal expressions" can easily be used as well to compose tedious prose. I guess what I am trying to get at is the edicts of the anti-modifier grammarians and pseudo-statisticians spring from a sense that some hard-to read prose uses a mass of modifiers for that effect. There is something there, but it's very hard to get at, and no one has come close to doing so. Incidentally, compared to the graceful Hebrew original, the King James translation Orwell so admires looks to me plodding and awkward.

    @EPS, Yes, 'swift' etc. take 'very', but they are used as noun phrases. I am no syntactician, and I don't know what to call these. I am not sure how O-K or Mark Liberman count adjectives and adverbs. In any case, I wanted to count only those adjectives/adverbs which I think Orwell had contrived to make the second sentence 'bad'.

  14. Y said,

    September 8, 2013 @ 7:13 pm

    I correct myself. 'swift' is an adjective, 'the swift' is a noun phrase. EPS is right.

  15. Rubrick said,

    September 8, 2013 @ 11:04 pm

    Mark, you really need to stop demolishing academics' carefully reasoned blather with a little well-executed actual work, or folks are going to start treating you like a scientist.

  16. Laoseng said,

    September 9, 2013 @ 3:36 am

    Very nice. Maybe we could run the script on Jame Joyce's Ulysses to see how many adjectives and adverbs he used. My guess is above 18% ;-)

    [(myl) Perhaps not, at least in the section from "Stately, plump Buck Mulligan" to "Usurper.", which scores

    8303 words, 976 punctuation tokens = 7327 real words
    328 adjectives = 4.5 percent
    354 adverbs = 4.8 percent
    682 adjectives+adverbs = 9.3 percent

    I didn't remove the Greek and Latin quotes, but I don't think they're long enough to affect the results much.

    However, in this case, NLTK's tagger seems to have failed us rather badly, since (for example) the sentence

    Stephen Dedalus, displeased and sleepy, leaned his arms on the top of the staircase and looked coldly at the shaking gurgling face that blessed him, equine in its length, and at the light untonsured hair, grained and hued like pale oak.

    is tagged as

    [('Stephen', 'NNP'), ('Dedalus', 'NNP'), (',', ','), ('displeased', 'VBD'), ('and', 'CC'), ('sleepy', 'NN'), (',', ','), ('leaned', 'VBD'), ('his', 'PRP$'), ('arms', 'NNS'), ('on', 'IN'), ('the', 'DT'), ('top', 'JJ'), ('of', 'IN'), ('the', 'DT'), ('staircase', 'NN'), ('and', 'CC'), ('looked', 'VBD'), ('coldly', 'RB'), ('at', 'IN'), ('the', 'DT'), ('shaking', 'NN'), ('gurgling', 'VBG'), ('face', 'NN'), ('that', 'IN'), ('blessed', 'VBN'), ('him', 'PRP'), (',', ','), ('equine', 'NN'), ('in', 'IN'), ('its', 'PRP$'), ('length', 'NN'), (',', ','), ('and', 'CC'), ('at', 'IN'), ('the', 'DT'), ('light', 'JJ'), ('untonsured', 'VBN'), ('hair', 'NN'), (',', ','), ('grained', 'VBD'), ('and', 'CC'), ('hued', 'VBN'), ('like', 'IN'), ('pale', 'NN'), ('oak', 'NN'), ('.', '.')]

    This is not the sort of performance that I'm used to seeing from nltk.pos_tag( ). I'll look into this further at some point in the future…

    UPDATE — I ran a different NLTK tagger (the "Stanford tagger") on the same material, with somewhat better results. The sentence in question comes out as

    [('Stephen', 'NNP'), ('Dedalus', 'NNP'), (',', ','), ('displeased', 'JJ'), ('and', 'CC'), ('sleepy', 'JJ'), (',', ','), ('leaned', 'VBD'), ('his', 'PRP$'), ('arms', 'NNS'), ('on', 'IN'), ('the', 'DT'), ('top', 'NN'), ('of', 'IN'), ('the', 'DT'), ('staircase', 'NN'), ('and', 'CC'), ('looked', 'VBD'), ('coldly', 'RB'), ('at', 'IN'), ('the', 'DT'), ('shaking', 'VBG'), ('gurgling', 'JJ'), ('face', 'NN'), ('that', 'WDT'), ('blessed', 'VBD'), ('him', 'PRP'), (',', ','), ('equine', 'NN'), ('in', 'IN'), ('its', 'PRP$'), ('length', 'NN'), (',', ','), ('and', 'CC'), ('at', 'IN'), ('the', 'DT'), ('light', 'JJ'), ('untonsured', 'JJ'), ('hair', 'NN'), (',', ','), ('grained', 'VBN'), ('and', 'CC'), ('hued', 'VBN'), ('like', 'IN'), ('pale', 'JJ'), ('oak', 'NN'), ('.', '.')]

    "Equine" is still tagged as a noun, and "grained" and "hued" as past participles; but it's better. Anyhow, the overall score for the first thousand-odd words comes out as

    1153 words, 117 punctuation tokens = 1036 real words
    84 adjectives = 8.1 percent
    52 adverbs = 5 percent
    136 adjectives+adverbs = 13.1 percent

    In the other cases where I've compared the two taggers, they're within a percent or so — and often closer — on the adjective+adverb proportion score. Thus O-K's "Trust" paper is 16.9% instead of 17.0%; the Microsoft annual report is exactly 11% according to both taggers; Zinsser comes out as 13.4% instead of 12.8%, etc.]

  17. J.W. Brewer said,

    September 9, 2013 @ 10:49 am

    Especially if the Ulysses result is an artifact of bad tagging, how easy/hard is it to find in-the-wild samples of English text where the adj/adv %age is outside the 10-20% range? How stylistically odd-to-the-reader would such outliers seem if no one stopped to do a count? It seems at least possible that "try not to produce prose whose %age of adj/adv's is more than x standard deviation(s) away from the median for the relevant genre" might be good stylistic advice, even if "assume your first draft probably has too many and edit by crossing a randomly-selected subset of them out" is not.

  18. PR said,

    September 9, 2013 @ 11:39 am

    Have you (or has anyone) been in touch with Dr O-K with this critique? I for one would be interested to see how he responds.

  19. Edward Lindon said,

    September 9, 2013 @ 3:22 pm

    Derrida as a model of unintelligibility is both trite and unfounded. As with any serious theoretical writer, in order to understand him, you need to understand his intellectual antecendents–Husserl, Heidegger, Austin etc. Many of those who are in such a haste to be critical of Derrida turn out simply to be poor, or superficial, readers of philosophy.

    [(myl) My impression, shared with several philosophers of my acquaintance, is that Derrida is a skilled writer of fake (and essentially meaningless) philosophy. To convince me otherwise, you might perform a more systematic version of the test described in "Can Derrida be 'even wrong'?", 9/23/2003, and "Labov's Test", 8/17/2005.]

  20. the other Mark P said,

    September 10, 2013 @ 12:30 am

    Derrida as a model of unintelligibility is both trite and unfounded.

    Well things usually become trite because they are true, albeit boringly true.

  21. Bloix said,

    September 10, 2013 @ 10:53 am

    Many writers (myself included) tend to have a number of crutch words that jump from our keyboards unto the screen to fill in the time that we're not saying something worthwhile. I believe that a disproportionate number of these words are adjectives and adverbs. Among them – actual/actually, clear/clearly, obvious/obviously, utter/utterly, definite/definitely (or "definately"), important/importantly, curious/curiously, perfectly, very, quite, fairly, etc.

    These words tell the reader that the writer is right, without explaining why the writer is right. Sometimes they're useful for rhythm or emphasis, but usually they're clutter that slows the reader down. And because they're an indication that the writer is not as educated or sophisticated as he or she would like to seem, they detract from the persuasive power of the writing.

    Eevery draft I write has to be pruned of this detritus before I let anyone see it. I would think that the instruction to abjure adverbs and adjectives was originally meant to apply to these empty vessels, and not to words that have genuine content.

  22. Is it really a good idea to avoid adjectives and adverbs? | The Proof Angel said,

    September 11, 2013 @ 4:00 am

    […] Liberman has done some statistical analysis of various sorts of writing in the Language Log […]

  23. Dan M. said,

    October 17, 2013 @ 10:06 am

    Bloix,

    That's a very clear statement of what I've always hoped was the justifiable origin of the anti-modifier usage rants.

RSS feed for comments on this post