Another dumb Flesch-Kincaid exercise

« previous post | next post »

E.J. Fox and Mike Spies, "Who was America's most well-spoken president?", 10/10/2014:

Using the Flesch-Kincaid readability test—the most well-known reading comprehension algorithm—Vocativ analyzed over 600 presidential speeches, going back to George Washington. We measured syllables along with word and sentence counts, and gave each speech a numerical grade. For instance, a grade of four means the content is accessible to a fourth-grader, while a grade of 12 corresponds to that of a high school graduate, a 15 to that of a college graduate and a 21 or higher to that of a PhD. Ultimately, we drew five conclusions, each of which was analyzed by Jeff Shesol, a historian and former speechwriter for Bill Clinton.

Here's the plot of their results:

An impressive trend, right? Except that the Flesch-Kincaid readability test is just a linear combination of syllables-per-word and words-per-sentence:

And we know that there are "Real trends in word and sentence length" (LLOG 10/31/2011). The Flesch-Kincaid metric is so insensitive to actual reading difficulty that (at least as implemented in the on-line "Readability Score" apps) it doesn't even matter whether the tested material is in the English language. Thus the following poem by Goethe is rated at the 4.1 grade level by this moronic metric:

Hin und wieder fliegen Pfeile;
Amors leichte Pfeile fliegen
Von dem schlanken golden Bogen,
Mädchen, seid ihr nicht getroffen?
Es ist Glück! Es ist nur Glück.

Warum fliegt er so in Eile?
Jene dort will er besiegen;
Schon ist er vorbei geflogen;
Sorglos bleibt der Busen offen;
Gebet acht! Er kommt zurück!

And the rot13 version actually scores at the 3.9 grade level:

Uva haq jvrqre syvrtra Csrvyr;
Nzbef yrvpugr Csrvyr syvrtra
Iba qrz fpuynaxra tbyqra Obtra,
Zäqpura, frvq vue avpug trgebssra?
Rf vfg Tyüpx! Rf vfg ahe Tyüpx.

Jnehz syvrtg re fb va Rvyr?
Wrar qbeg jvyy re orfvrtra;
Fpuba vfg re ibeorv trsybtra;
Fbetybf oyrvog qre Ohfra bssra;
Trorg npug! Re xbzzg mheüpx!

Can we please have a moratorium on using the Flesch-Kincaid metric to rate political speeches?

A couple of past posts that help to make the case for this moratorium:

"News flash: Congresscritters using slightly shorter words and sentences", 5/23/2012
"Language guru runs with the journalistic pack", 6/17/2010

[h/t Victor Steinbok]



  1. Q. Pheevr said,

    October 26, 2014 @ 9:22 pm

    Rf vfg ahe Tyüpx, indeed. Or should that be Tyhrpx? (Tyḧpx?)

    [(myl) seems to be puzzled by non-ascii characters, and refuses to touch them. Thus
    ça était là
    çn égnvg yà

  2. D.O. said,

    October 27, 2014 @ 12:48 am

    Oh, but I am sure German 4th graders are learning "Hin und wieder fliegen Pfeile" by heart with gusto.

  3. RobertL said,

    October 27, 2014 @ 4:57 am

    Flesch-Kincaid is the test that treats "tort" as a simple word and "hippopotamus" as a difficult one.


  4. Jason said,

    October 27, 2014 @ 6:13 am


    First we construct a higher-kinded class T that takes a covariant parameter A. Next we create a new value with the type parameter bound to AnyRef. Now, if we try to assign our T[AnyRef] to a variable of type T[Any], the call succeeds. This is because Any is the parent type of AnyRef, and our covariant constraint is satisfied. But when we attempt to assign a value of type T[AnyRef] to a variable of type T[String], the assignment will fail.
    The compiler has checks in place to ensure that a covariant annotation doesn’t violate a few key rules. In particular, the compiler tracks the usage of a higher-kinded type and ensures that if it’s covariant, that it occurs only in covariant positions in the compiler. The same is true for contravariance. We’ll cover the rules for determining variance positions soon, but for now, we’ll look at what happens if we violate one of our variance positions.
    (From Scala in Depth by Joshua D Suereth.)

    According to the online Flesch-Kincaid readability test, this text is pictched at the level someone with a 10th grade education!

  5. Jer said,

    October 27, 2014 @ 1:15 pm

    Can we please have a moratorium on using the Flesch-Kincaid metric to rate political speeches?

    I believe that moratorium will go into effect sometime after the moratorium on "humans only using 10% of their brain" goes into effect. (But before the moratorium on "the Inuit have a bajillion words for snow" goes into effect.)

  6. Emily Davis said,

    October 28, 2014 @ 12:24 pm

    Flesch-Kincaid is the test that treats "tort" as a simple word and "hippopotamus" as a difficult one.

    Bit of a tangent, but I have a distinct childhood memory of being annoyed by a song we sang in kindergarten that concluded "But I can't spell 'hippopotamus.'" Because it didn't seem that hard to me at all!

    Back on topic, does the Flesch-Kincaid scale take any historical changes (in language, literacy, and education) into account? That would seem to be pretty important. What qualified as a "fourth-grade" or "tenth-grade" reading level in the 18th century?

    [(myl) My understanding of the scale's history is that it was based on applying linear regression to the features of mean word length in syllables and mean sentence length in words, from a specific sample of graded reading materials collected in 1975. The passages were all selections from Navy training manuals; a set of subjects were tested by other methods to determine their reading level as a grade score; the passages were assigned a grade difficulty by the criterion that "50 percent of subjects with a reading ability at a particular grade level had to score 35 percent or better on the cloze test"; and then regression was used to relate the passages assigned grade levels to their word length and sentence length features.

    You can read all about it here.]

  7. Mark Stephenson said,

    October 30, 2014 @ 6:42 pm

    Talking of a moratorium, my local paper today has a headline:

    Rockville to Begin Building Moratorium

    How do you build a moratorium?

  8. ZZMike said,

    November 1, 2014 @ 7:37 pm

    It's probably been said that the F-K readability test is practically useless – unless used by someone who's going to use it as a first approximation.

    The rot13 example is the perfect reason it's not useful: it doesn't look at meaning, only letter sequences. Sentence length may be a marker for readability, but a well-written 100-word sentence is not hard. A badly written 10-word sentence may be incomprehensible.

    You could take each word (skip the common ones) and run it through the database of texts (can't remember the name), and count occurrences. Lots of low-scoring words would lower the RI.

    But there are other considerations: ungrammatical phrases, dangling participles, unconnected thoughts, subject/verb agreement, &c &c.

    Mark Stephenson: From Pinker's "Sense of Style", another headline:
    "Plan to catch mice hit by mayor"

RSS feed for comments on this post