More Flesch-Kincaid grade-level nonsense

« previous post | next post »

Matt Viser, "For presidential hopefuls, simpler language resonates" (" Trump tops GOP field while talking to voters at fourth-grade level"), Boston Globe 10.20/2015:

When Donald Trump announced his presidential campaign, he decried the lack of intelligence of elected officials in characteristically blunt terms.

“How stupid are our leaders?” he said. “How stupid are they?”

But with his own choice of words and his short, simple sentences, Trump’s speech could have been comprehended by a fourth-grader. Yes, a fourth-grader.

The Globe reviewed the language used by 19 presidential candidates, Democrats and Republicans, in speeches announcing their campaigns for the 2016 presidential election. The review, using a common algorithm called the Flesch-Kincaid readability test that crunches word choice and sentence structure and spits out grade-level rankings, produced some striking results.

The Republican candidates — like Trump — who are speaking at a level easily understood by people at the lower end of the education spectrum are outperforming their highfalutin opponents in the polls.

How stupid are our journalists?

Over and over again, they dress up plausible insights, like "Simpler language succeeds in politics", with credulous references to an outdated and simple-minded metric that pretends to predict reading level based only on average word and sentence length.

For some details, see "Another dumb Flesch-Kincaid exercise", 10/26/2014, which observes that this passage scores at the 3.9 grade level:

Uva haq jvrqre syvrtra Csrvyr;
Nzbef yrvpugr Csrvyr syvrtra
Iba qrz fpuynaxra tbyqra Obtra,
Zäqpura, frvq vue avpug trgebssra?
Rf vfg Tyüpx! Rf vfg ahe Tyüpx.

Jnehz syvrtg re fb va Rvyr?
Wrar qbeg jvyy re orfvrtra;
Fpuba vfg re ibeorv trsybtra;
Fbetybf oyrvog qre Ohfra bssra;
Trorg npug! Re xbzzg mheüpx!

And this most recent application (to the "speeches announcing [the candidates'] campaigns for the 2016 presidential election") is especially dumb, because some of these speeches were written texts, while others are transcripts of presentations ad libbed on the spot. Spoken language is generally less formal than written language, and will tend to have shorter words. Sentence length — in either case — depends a lot on punctuation choices. And in transcripts of extemporized remarks, the punctuation choices are not even those of the author of the remarks.

Here are three differently-punctuated paragraphs from Trump's announcement, brought up from the comments:

It’s coming from more than Mexico. It’s coming from all over South and Latin America. And it’s coming probably — probably — from the Middle East. But we don’t know. Because we have no protection and we have no competence, we don’t know what’s happening. And it’s got to stop and it’s got to stop fast. [Grade level 4.4]

It’s coming from more than Mexico, it’s coming from all over South and Latin America, and it’s coming probably — probably — from the Middle East. But we don’t know, because we have no protection and we have no competence, we don’t know what’s happening. And it’s got to stop and it’s got to stop fast. [Grade level 8.5]

It’s coming from more than Mexico, it’s coming from all over South and Latin America, and it’s coming probably — probably — from the Middle East; but we don’t know, because we have no protection and we have no competence, we don’t know what’s happening. And it’s got to stop and it’s got to stop fast. [Grade level 12.5]

It's uncharitable and unfair of me to imply that the author of the Globe piece might be "stupid". But at some point, journalists should look behind the label to see what a metric like "the Flesch-Kincaid score" really is, and ask themselves whether invoking it is adding anything to their analysis except for a false facade of scientism.

 

 

 



18 Comments

  1. Dr G Cox said,

    October 23, 2015 @ 7:27 am

    Wisdom can be stated in simple words.
    Meaningless concatenations of incoherent arguments can be wrapped in exotic vocabulary.

    [(myl) But the Flesch-Kincaid measure doesn't even care whether the input letter-strings are actual words of English, as opposed to random gibberish…]

  2. Chips Mackinolty said,

    October 23, 2015 @ 7:35 am

    Curious. The Australian media at the moment, joined by the British conservative press, are currently praising the newly coup-installed Australian PM Malcolm Turnbull, for speaking intelligently and in more-than-three-word-slogans. Certainly he is a long way from Trumpist Flesch-Kincaid simplicity in delivery, but seems to be hitting the mark … at least with the media. At least in media reports, I suspect the same is true of the newly installed Trudeau in Canada, who appears to be able to string a few words together in more than one language!

  3. tpr said,

    October 23, 2015 @ 8:43 am

    While I'm not interested in defending the Flesch-Kincaid measure, I'm not terribly convinced by your reasoning here, Mark. The algorithm might not "care whether the input letter-strings are actual words of English, as opposed to random gibberish", but if you start with the minimal assumption that the input IS composed of actual words of English produced in the wild, how well does it track reading level then? The reasons why it would fail under these circumstances are much more relevant to any debunking of its usefulness than arguing that it fails on gibberish. I assume that there is at least some correlation between word lengths and reading level, though probably because shorter words tend to be more common.

    [(myl) It's certainly true that there's a correlation between word length and frequency, though it's far from perfect.

    The biggest problem is that sentence length in transcriptions depends on punctuation choices.

    Here are two differently-punctuated pieces of Trump's announcement:

    It’s coming from more than Mexico. It’s coming from all over South and Latin America. And it’s coming probably — probably — from the Middle East. But we don’t know. Because we have no protection and we have no competence, we don’t know what’s happening. And it’s got to stop and it’s got to stop fast. [Grade level 4.4]

    It’s coming from more than Mexico, it’s coming from all over South and Latin America, and it’s coming probably — probably — from the Middle East. But we don’t know, because we have no protection and we have no competence, we don’t know what’s happening. And it’s got to stop and it’s got to stop fast. [Grade level 8.5]

    So why not just report word-length statistics directly? And maybe something about word frequency, or a measure of sentence complexity that doesn't depend on how a transcriber chooses to punctuate.]

  4. Stephen Hart said,

    October 23, 2015 @ 10:30 am

    Likely this persists because Microsoft Word includes the Flesch-Kincaid readability test.

  5. Charles Broming said,

    October 23, 2015 @ 10:43 am

    I agree with TPR. The correct question is, "How 'good' are the Flesch-Kincaid assessments when applied to the proper domain?" The functions, f(x) = 1/x and g(x) = sqrt(x) make sense in the Real numbers, but only when applied to a limited domain. Similarly, infinitely many polynomials have no solutions (make no sense?) unless you evaluate them for complex solutions. For an enlightening discussion of this matter in the context of Euler's formula for the relationship of the sides, faces and edges of polygons (and the uproar in 19th century geometry about it), see Imre Lakatos, "Proofs and Refutations". (I bet you've already read it.)

    I also agree with your point about the stupidity of many journalists ("stupidity" may be harsh, but is "aspiring" any better – upon reflection?) and their need to dress up plausible inferences with credulous references. Please add inaccurate or too-general inferences from incomplete or cherry-picked statistics, complete ignorance of the relevant data or analyses thereof and sloppy language usage when discussing anything "scientific". One of Aristotle's commentators, I don't recall which one, referred to Aristotle's criticisms of Plato's philosophies (metaphysics and political, especially) as "willfully stupid". Sorry, I just get frustrated when smart people (many journalists) write stupid things because, unlike with their speech, they have the opportunity to read, edit and rewrite their prose (not to mention editorial oversight–ha ha).

  6. DWalker said,

    October 23, 2015 @ 10:46 am

    Presumably the metric assumes that all shorter words are "simpler" and "learned earlier in life" than all longer words.

    That may be true to some degree, but it's not ALWAYS true. The metric just uses word length as a proxy for "word simplicity" or "word common-ness".

  7. Guy said,

    October 23, 2015 @ 12:40 pm

    Charles Broming, the Flesch-Kincaid reading level is defined for gibberish input, so a better analogy might be that the first few terms of the power series for the sine function is a good estimate for values near zero but terrible for others. I agree that a system that doesn't confirm the input isn't gibberish isn't necessarily useless as a measure of reading difficulty when applied to the statistical distribution of real world writing. The main problems are that 1) the parameters and of Flesch-Kincaid are basically arbitrary and its reliability is not tested against some measure of actual difficulty and 2) when applied to speech, its value depends heavily on punctuation choices made by the transcriber.

  8. bratschegirl said,

    October 23, 2015 @ 2:34 pm

    Staying tuned for the article concerning how many words Mr. Trump has for snow…

  9. DWalker said,

    October 23, 2015 @ 2:51 pm

    @bratschegirl: Ha! You win the Internets today.

  10. John Busch said,

    October 23, 2015 @ 7:00 pm

    If you used the Flesch-Kincaid along with a Lexile level and a vocabulary profile, you would have a more interesting and useful measure of the complexity of political discourse. Still, you couldn't compare spoken with written/edited texts, and it's not obvious that a lower level of difficulty would correlate with simplistic ideas.

  11. Jason Eisner said,

    October 24, 2015 @ 11:02 am

    @John Busch – I think the reporter is arguing that simple language correlates with poll ratings, not with simple ideas.

    @myl – As you say, this is a plausible insight. What would be a better way to investigate it? Has a better "reading level" metric been created that could be applied to (e.g.) the transcripts of U.S. presidential primary debates (not only in 2015)?

    (I don't expect variants of reading level to really capture whether a candidate's answer is easy to follow: that would also require measures of discourse coherence, speech rate and prosody, and whether the answer's content and framing are already familiar to the audience. And any automatically computed measure should be validated against actual human judgments, or (better) replaced by them. Still, it would be interesting to check whether the candidates differ significantly on the linguistic simplicity of their answers. It would also be interesting to see whether individual candidates show signs that they are deliberately aiming for simple language — e.g., they use simpler language in these high-profile settings, or have changed their style over years of campaigning. The same questions might be asked for measures other than simplicity.)

  12. James Wimberley said,

    October 24, 2015 @ 5:42 pm

    Rf vfg Tyüpx! Indeed, the Tyüpx don't mess around. The tribute is very modest considering the cornucopia of advanced technology they have brought to backward earthlings. It's an unfortunate glitch in a mutually beneficial relationship that the tank-trained humans selected to serve as interpreters go mad after three months on the job, though there is controversy whether this is due to the extreme complexity of the language – more inflected than Georgian, with extra genders and aspects, phonetically combining Chinese tones and Khoisan clicks, and written in animated glyphs – or the arbitrary brutality of some of the content when the Tyüpx are displeased.

  13. Right No said,

    October 24, 2015 @ 8:59 pm

    Word and sentence length as a proxy for complexity/difficulty ( they are not the same) may work in some restricted domains.
    However, many counter examples exist.

    atresia > butterfly
    Embolise the glioma. > Glamorous Kardeshian caught sniffing cocaine in nightclub.

    where ">" signifies greater difficulty

  14. Tim Finin said,

    October 25, 2015 @ 9:20 am

    I just heard US presidential hopeful Ben Carson say on Meet the Press "The constitution was written at an eighth-grade level for a reason".

    [(myl) That's about as accurate as his pronouncements generally are. At least by the (admittedly moronic) Flesch-Kincaid metric, the Constitution's grade level is 19.2:

    ]

  15. Kaleberg said,

    October 26, 2015 @ 9:46 pm

    How well does Scrabble score work?

  16. Tim Finin said,

    October 26, 2015 @ 10:55 pm

    I had tried running the plain text (with amendments) through Word's grammar checker and it gave it a FK grade level of 6.6 and a FK reading ease score of 59.7. Are there variations on how FK is defined? I recall using FK as an exercise for students learning Python and think I had simplified it a bit.

  17. Ted Cruz is no Scyld Scefing | The Life of Words said,

    October 29, 2015 @ 11:44 am

    […] debunk­ing of one of these scores applied to the same con­text, see a series of LLog posts: “More Flesch-Kincaid grade-level non­sense” [23.10.15], “Back to the Bushisms indus­try?” [17.8.15], “Another dumb Flesch-Kincaid […]

  18. Here be data | prendrelangue said,

    November 1, 2015 @ 5:50 am

    […] worse still, as Language Log's Mark Liberman has pointed out time and again (and indeed very recently), the Flesch Reading Ease algorithm doesn't even care whether it's dealing with […]

RSS feed for comments on this post