Adam Okulicz-Kozaryn, "Cluttered writing: adjectives and adverbs in academia", Scientometrics 2013:
[H]ow do we produce readable and clean scientiﬁc writing? One of the good elements of style is to avoid adverbs and adjectives (Zinsser 2006). Adjectives and adverbs sprinkle paper with unnecessary clutter. This clutter does not convey information but distracts and has no point especially in academic writing, say, as opposed to literary prose or poetry.
If you've seen my earlier discussion of this paper ("'Clutter' in (writing about) science writing", 8/30/2013), you'll recall that Dr. O-K goes on to count adjectives and adverbs in some word lists from samples of scientific writing. He asserts that "social science" writing uses about 15% more adjectives and adverbs than "natural science" writing — although he doesn't tell us enough about his methods to dispel concerns about several likely sources of artifact — and he concludes by asking "Is there a reason that a social scientist cannot write as clearly as a natural scientist?"
In the interests of science of all kinds, I decided to devote this morning's Breakfast Experiment™ to the relations between text quality and the proportion of adjectives and adverbs. I wrote a python script using NLTK to calculate the proportions of various parts of speech in a document; and then I tried this script out on samples of various sorts of writing. Here's some of what I found.
To start with, I decided to try some really cluttered prose, prose that is not at all "readable and clean": Edward Bulwer-Lytton's Paul Clifford. Wikipedia tells us that this novel is considered to represent 'the archetypal example of a florid, melodramatic style of fiction writing'". Its first sentence:
It was a dark and stormy night; the rain fell in torrents, except at occasional intervals, when it was checked by a violent gust of wind which swept up the streets (for it is in London that our scene lies), rattling along the house-tops, and fiercely agitating the scanty flame of the lamps that struggled against the darkness.
I put essentially all of the first chapter of this work into a file (minus the paragraphs that are mostly dialogue, much of which is in dialect). According to NLTK's pos_tag() function, which should be about 95% correct, the score was:
1775 words, 184 punctuation tokens = 1591 real words
108 adjectives = 6.8 percent
78 adverbs = 4.9 percent
186 adjectives+adverbs = 11.7 percent
So Bulwer-Lytton's chapter is about 12% adjectives and adverbs. What should we compare this to? Well, Dr. O-K cites William Zinsser's On Writing Well as his authority for the cluttering nature of adjectives and adverbs, so let's try the first three sections of that work (minus quotations from others, of course):
3939 words, 439 punctuation tokens = 3500 real words
241 adjectives = 6.9 percent
208 adverbs = 5.9 percent
449 adjectives+adverbs = 12.8 percent
Hmm. Well, maybe this is experimental error. And Bulwer-Lytton's writing is clear enough, it's just kind of overwrought. So let's take a look a something by Jacques Derrida, whose prose is about as unreadable as anything I've ever encountered. Here's the score for chapter 2 of "Of Grammatology" (in English translation, of course):
19239 words, 2105 punctuation tokens = 17134 real words
1434 adjectives = 8.4 percent
946 adverbs = 5.5 percent
2380 adjectives+adverbs = 13.9 percent
OK, that's better — Derrida has 19% more adjectives and adverbs than Bulwer-Lytton. But he's only got 8% more than Zinsser, and Zinsser has more than Bulwer-Lytton, so this still doesn't all seem to be working out the way we were told it would.
Let's go for another paragon. Dr. O-K opens his paper with a quote from Mark Twain: "When you catch an adjective, kill it." So let's try the whole letter that the quote came from:
1474 words, 170 punctuation tokens = 1304 real words
89 adjectives = 6.8 percent
95 adverbs = 7.3 percent
184 adjectives+adverbs = 14.1 percent
Oops. We're really going in the wrong direction here — Saint Mark uses the highest proportion of adjectives and adverbs that we've seen so far.
And what about Dr. O-K's own writing? Here's the score for the text of "Cluttered writing: adjectives and adverbs in academia" itself (of course minus the quotations from others):
883 words, 80 punctuation tokens = 803 real words
85 adjectives = 10.6 percent
42 adverbs = 5.2 percent
127 adjectives+adverbs = 15.8 percent
We have a winner! Dr. Okulicz-Kozaryn's text, about the importance of eliminating adjectives and adverbs from prose, has fully 35% more adjectives and adverbs than the infamous "It was a dark and stormy night" passage, which has given its author's name to an annual bad writing contest!
(127/803)/(186/1591) = 1.3528
1121 words, 104 punctuation tokens = 1017 real words
113 adjectives = 11.1 percent
60 adverbs = 5.9 percent
173 adjectives+adverbs = 17 percent
Seriously, the problem is not in Dr. O-K's writing (despite the sprinkling of slavicisms), but in his ideas. Calculating the relative percentages of adjectives and adverbs in texts tells us nothing useful about their readability, clarity, or efficiency.
I'll spare you the reports for the other 45 texts that's I've tested. But just to let Dr. O-K off the hook for the "most modifiers" prize, let me note that the text of Ben Yagoda's piece from the Chronicle of Higher Education on adjectival anxiety ("The Adjective — So Ludic, So Minatory, So Twee", 2/20/2004), beats him out:
1908 words, 301 punctuation tokens = 1607 real words
208 adjectives = 12.9 percent
86 adverbs = 5.4 percent
294 adjectives+adverbs = 18.3 percent
Finally, I need to point out that there's a technical flaw in the whole "avoid adjectives and adverbs" idea — nouns are often modified by other nouns, or by prepositional phrases, or in other ways that don't involve adjectives; and verbs are often modified by prepositional phrases, subordinate clauses used as verbal adjuncts, and so on.
If it were true, counterfactually, that modification in general was a Bad Thing, then we'd need to count these other sorts of modifiers as well, not just adjectives and adverbs.
Some of the previous LL posts on modificational anxiety:
"Those who take the adjectives from the table", 2/18/2004
"Avoiding rape and adverbs", 2/25/2004
"Modification as social anxiety", 5/16/2004
"The evolution of disornamentation", 2/21/2005
"Adjectives banned in Baltimore", 3/5/2007
"Automated adverb hunting and why you don't need it", 3/5/2007
"Worthless grammar edicts from Harvard", 4/29/2010
"Getting rid of adverbs and other adjuncts", 2/21/2013
"'Clutter' in (writing about) science writing", 8/30/2013
N.B. Someone who took this whole business seriously enough to want to look at differences in part-of-speech distributions among scientific disciplines should know that Okulicz-Kozaryn is wrong when he writes that
as of 2012 I cannot bulk download enough full texts to have a representative sample of a discipline.
Between arXiv, the PLoS collections, SSOAR, the resources available from the ACL, and so on, it would not be hard to create large enough samples in enough different disciplines and subdisciplines to engage the question more seriously than Okulicz-Kozaryn did. But you ought to have another hypothesis to test as well, in my opinion, because the modifier-percentage idea looks like a loser.
Update — I realize that it's only fair for me to report the score for this blog post. Leaving out the quotations and so on, and without this update, I get:
1143 words, 146 punctuation tokens = 997 real words
82 adjectives = 8.2 percent
59 adverbs = 5.9 percent
141 adjectives+adverbs = 14.1 percent
The same overall percentage as Mark Twain…
Update #2 — William Zinsser complains that
Clutter is the disease of American writing. We are a society strangling in unnecessary words, circular constructions, pompous frills and meaningless jargon. Who can understand the clotted language of everyday American commerce: the memo, the corporation report, the business letter, the notice from the bank explaining its latest “simplified” statement?
So I decided to score Microsoft's 2012 Annual Report:
1499 words, 92 punctuation tokens = 1407 real words
117 adjectives = 8.3 percent
38 adverbs = 2.7 percent
155 adjectives+adverbs = 11 percent
There are certainly some unnecessary words and pompous frills in that report ("we delivered strong results, launched fantastic new products and services, and positioned Microsoft for an incredible future"), but the percentage of adjectives and adverbs is not a good measure of those characteristics.
Update #3 — I should also tell you that I did check the adjective and adverb proportions in various natural-science articles. For example, the first page of the first article in the current issue of Physical Review A (A. Rançon et al., "Quench dynamics in Bose-Einstein condensates in the presence of a bath: Theory and experiment") weighs in at
908 words, 62 punctuation tokens = 846 real words
103 adjectives = 12.2 percent
35 adverbs = 4.1 percent
138 adjectives+adverbs = 16.3 percent
And a combination of the first seven abstracts from the current issue of Science scores
1041 words, 81 punctuation tokens = 960 real words
106 adjectives = 11 percent
36 adverbs = 3.8 percent
142 adjectives+adverbs = 14.8 percent
This tends to confirm my suspicion that Okulicz-Kozaryn's result (15% lower proportion of adjectives and adverbs in natural science compared to social science text) probably results from one of the obvious sources of artifact, for example an inappropriate attempt to calculate part-of-speech percentages in text derived from passages like this one (from the second page of the Rançon et al. paper):
Since Okulicz-Kozaryn's counts came from word lists supplied by JSTOR, and he doesn't tell us which lists he used or how he processed them, and JSTOR doesn't tell us how they created the lists, we'll probably never know.