"X percent of Y are Z"

« previous post | next post »

It's amazing how troublesome simple percentage-talk can be. Donald McNeil Jr., "Fewer Ebola Cases Go Unreported Than Thought, Study Finds", NYT 12/16/2014

By looking at virus samples gathered in Sierra Leone and contract-tracing data from Liberia, the scientists working on the new study estimated that about 70 percent of cases in West Africa go unreported. That is far fewer than earlier estimates, which assumed that up to 250 percent did.

As stated, this seemed to me to be impossible. It might well be true that 70 out of every 100 Ebola cases go unreported. But on that interpretation, the cited "earlier estimate" — that 250 out of every 100 cases might go unreported — is logically incoherent.

So I guess we should interpret this to mean that (it was estimated that) for every 100 cases that are reported, 250 are not reported. This would mean that 250 out of 350 cases (71%) go unreported. On this construal, an underreporting rate of 70% would mean that for every 100 cases that are reported, 70 are unreported, so that 70 out of 170 cases (41%) go unreported.

Wondering what the "new study" really said, I looked into it a bit further. The study in question is Samuel V. Scarpino et al., "Epidemiological and viral genomic sequence analysis of the 2014 Ebola outbreak reveals clustered transmission", Clinical Infectious Diseases, 12/15/2014.

Our analysis of EBOV genome sequences also provided an estimate of the proportion of cases sampled of 58% (20–99%). However, over 70% of confirmed patients for the period of late May to mid June in Sierra Leone were sequenced [8]. The discrepancy suggests that underreporting of cases is approximately 17%, with a maximum of 70%. […]

Although our estimate of underreporting has high uncertainty, our upper bound of 70% is well below the early estimate of 250% [17], suggesting that underreporting could be far less prevalent than previous estimates implied.

In other words, they sequenced viral genomes from "over 70%" of confirmed patients. Their analysis, based on the distribution of genomic variants, suggested that their sample covered 58% of the variants in the overall viral population.  On this analysis, their sample would be missing about 12 cases for every 70 covered, or 17 for every 100. That is, for every 100 reported cases, 17 were unreported.

The higher (70%) estimate comes from the lower bound of the confidence interval (20-99%), according to which they missed (70-20)/70 = 71%. (Which turns into 70% because the original sample was bit "over 70%" of confirmed patients…) I'm not sure why the NYT story goes with this "upper bound" of 70%, rather than the central estimate of 17%.

(Of course, if we express the 17% central estimate in terms of the way that I first tried to interpret the newpaper story, it yields a rate of 17/117 = 14.5%. And again, the upper bound of 70% is 70/170 = 41% in that way of thinking about it. The meaning of a simple statement about percentages is surprisingly unclear…)

More on what Scarpino et al. did — see the paper and the supplementary material for further details:

We fit a transmission-oriented phylodynamic model [7] to 78 EBOV genome sequences collected from over 70% of the confirmed cases arising in June of the current outbreak in Sierra Leone [8]. This model infers a time-based evolutionary reconstruction of the viral dynamics. We then used a Bayesian approach [9] on the same genomic data to reconstruct the transmission chains. We also fit a complementary Susceptible Exposed Infectious Removed (SEIR) network model that estimated clustering based on confirmed EVD cases and deaths [10, 11], inferring parameters for a clustered (φ > 0) and a nonclustered population (φ = 0). Parameters of these SEIR models were fit to the cumulative numbers of laboratory-confirmed EBOV cases and laboratory-confirmed EBOV deaths obtained from the WHO Global Alert and Response news from May 27–August 31 2014 (Supplementary Appendix), and the starting date for the SEIR model was sampled over the posterior distribution for the initial case supplied by our phylodynamic analysis.


  1. GH said,

    December 16, 2014 @ 12:25 pm

    The model that yields 14.5% and 41% is so much more natural to me (and to my mind in much better accordance with the meaning of "percent") that even with the explanation I had to struggle to comprehend what 250% referred to.

    If I understand correctly, you were exactly right in your initial guess that:

    we should interpret this to mean that (it was estimated that) for every 100 cases that are reported, 250 are not reported. This would mean that 250 out of 350 cases (71%) go unreported.

  2. Michael Rank said,

    December 16, 2014 @ 1:59 pm

    Construal?? I'm British, never heard of this word, is it regular American English?

  3. Guy said,

    December 16, 2014 @ 2:28 pm

    I associate "construe" with (American) legal english. I was also a bit surprised to hear it outside a legal context but it seems to be being used with essentially what I consider its ordinary meaning, and I don't see anything wrong or objectionable with its usage here. If it's less familiar in British English (I don't know, I wouldn't expect it to be very familiar to most Americans) that may have something to do with it being in the Constitution, effectively "fossilizing" the legal parlance of 1789. Its meaning is essentially equivalent to "interpret", so construal interpretation, or reading, basically.

  4. January First-of-May said,

    December 16, 2014 @ 2:46 pm

    @Michael Rank: I'm Russian and I've never heard of that word either (outside of Language Log at least).
    For what it's worth, Wiktionary doesn't give it a regional remark, so it's just regular linguistic jargon, apparently.

    That said, I had no problem figuring out what it meant (though I admit the context helped).

  5. Daniel Barkalow said,

    December 16, 2014 @ 2:49 pm

    It seems to be a coincidence that the 70% which is the portion of confirmed cases where they sequenced the virus matches the 70% which is the upper bound of the quantity of unreported cases as a portion of confirmed cases. I'd guess the NYT story decided to go with the number that appeared twice.

  6. January First-of-May said,

    December 16, 2014 @ 3:16 pm

    To elaborate on my previous post (and @Guy, which was posted after I started typing)…

    The verb "(to) construe" doesn't feel particularly "legal" to me (or "archaic", or anything like that). It just feels like a regular word, and would definitely be a word I would use in (written, e.g. forum) conversation if the context calls for it (and I have* in fact used it in forum posts previously).
    However, in my mind, the specific meaning of this particular verb (as opposed to "interpret" and other synonyms) essentially came down to "understand/interpret, in a less plausible manner" (or, perhaps, "in a less plausible looking manner" – basically when we have to twist some idea to get that interpretation).

    So when I saw the noun "construal" on Language Log, I thought it meant something along the lines of "a less plausible interpretation" – which happens to perfectly fit the sort of contexts it would normally occur in (headline weirdness and the like).

    As I now realized while searching for examples of the verb online, it is not always used in the meaning I thought it had (though examples that fit my version are still very common). I'm not sure if I'll be able to change my mental idea of it, however (or, for that matter, whether I even need to, considering how common my version is).

    *) An actual example from a post I made on another forum about a year ago: "Some posters thought that the OP might, possibly, be vaguely construed as advertising, and tried to play along (in what they thought was good humor)."

    [(myl) I associate construe (and to a lesser extent, construal) with my rather old-fashioned instruction in Latin, 50-odd years ago. In the "recitation" portion of each lesson, a designated victim would be required to read the Latin text, sentence by sentence; translate it into English; and then "construe" it, which meant to specify for each word its lemma, morphological form, and syntactic/semantic function. Since "I don't have a clue" was deprecated as an answer, it's true that the results of this process of construal were often implausible if not preposterous.

    In contemporary usage, as far as I can tell, "construal" is just a somewhat fancier synomym for "interpretation", without any particular implication that the interpretation is implausible, except to the extent that the word draws attention to the fact that an act of interpretation is being performed.

    But in trying to figure out what it might possibly mean for 250% of ebola cases to go unreported, perhaps I was subconsciously reminded of being put on the spot by Mr. Mansur about whether "peris" was the dative plural of pera or the 2nd singular present indicative of pereo or whatever.]

  7. Pflaumbaum said,

    December 16, 2014 @ 4:12 pm

    I'm a BrE speaker and the word 'construal' is completely normal to me.

    Unfortunately statistical thinking isn't… So that, embarrassingly, I was lost about halfway through this post. Keep meaning to learn some stats.

  8. Thomas Lumley said,

    December 16, 2014 @ 5:40 pm

    It looks as though the 70% came from the university press office: it's also in Yale News

    [(myl) Makes sense — most science reporting these days is a re-write of press releases.]

  9. Rubrick said,

    December 16, 2014 @ 6:12 pm

    Accompanying the hijackers here…

    The word "construe" appears in, of all places, The Lord of The Rings, by the quite British Tolkien. Saruman suggests to Gandalf that he has "misconstrued my intentions wilfully". It always struck me as tonally jarring, as though a modern lawyer had been plopped into this medievalesque fantasy world. (Now that I think about it, it may well have been a deliberate part of Tolkien's Saruman==Modernity==Evil narrative theme.)

  10. maidhc said,

    December 16, 2014 @ 6:49 pm

    My association with construe comes from a diet of English public school stories from Tom Brown, Stalky & Co. to Nigel Molesworth, combined with my father's recollections of the many English public school stories he used to read as a lad in Chums. In those, the Latin master would spout out a bit of Cicero or Vergil and then say "Now, Jenkins Minor, construe!"

    I see that myl has experienced something similar.

  11. Brett said,

    December 16, 2014 @ 7:02 pm

    I learned the word "construe[d]" (along with "enumeration") from the Ninth Amendment, which is one of the most important amendments in the Bill of Rights, arguable even more important than the First Amendment. (For those keeping score, the least important amendment in the Bill of Rights is the Third.)

    @Rubrick: While "construe" has, to my American ear, a legal flavor, "misconstrue" does not. I would probably not use "construe" in speech unless I wanted to make a specific or technical point. (There is a slight difference in meaning between the meanings of "construe" and "interpret," I think. Only a conscious mind can "construe" something, while "interpret" is broader. A compiler can "interpret" code but not "construe" it.) However, "misconstrue" is a word I use relatively frequently. (Related to my previous parenthetical, "misconstrue" tends to have an accusatory note. If you "misinterpret" something, that could be accidental, but if I state that you have "misconstrued" what I said, that suggests that you have willfully misconstrued it.)

  12. Old Gobbo said,

    December 16, 2014 @ 7:34 pm

    The OED (2009 cd) shows misconstrue used in the sense of misunderstanding, taking something in a wrong sense, from Chaucer through to the King James Bible, taking in More, Shakespeare and Fulke Greville on the way, as well as later uses. I doubt if Tolkien makes many mistakes of the kind you suggest.

  13. Old Gobbo said,

    December 16, 2014 @ 8:37 pm

    @myl: Leaving aside the use of 'fit' as (apparently – “were fit” seems inescapably so) a past participle, I still find myself puzzled by both of the quotations you cite from the paper, alas, as a pensioner, I have no 'access' to Oxford bl**ding Journals, and cannot myself explore what they might have been up to, though the title (”Epidemiological and viral genomic … analysis … reveals clustered transmission”) suggests that the overt point of the paper is to tell us what we surely already know. Against that, 'phylodynamic' is a nice new word, to me at least.

    Leaving aside the actual confidence level used (and I imagine that, since this is biology, it was probably 95%), the “confidence interval” itself is unclear. Are we talking about Bayesian 'credible intervals' ? Or are they simply saying that for this sample the true result, for an estimate of the proportion of virus variants sampled, lies somewhere between 20% and 99% ? In that case, calculation of the upper bound for the proportion missed would surely give us 400% ([100-20]*100/20_ – or have I, as usual, misunderstood something ?

    Meanwhile I am slightly disturbed by the title of their network model: “Susceptible Exposed Infectious Removed” – what does 'removed' mean in this context ?

  14. Doreen said,

    December 16, 2014 @ 9:18 pm

    Michael Rank asked about the word construal, not construe. IME the corresponding noun used in BrE legal language (at least in England & Wales – I can't say for certain about Scotland) is construction, e.g. in combinations (or constructions!) such as "strict construction", "a question of construction", "rules of construction" etc.

  15. Nathan Myers said,

    December 17, 2014 @ 1:53 am

    Most readers probably have encountered "misconstrued" most frequently in Simon's & Garfunkel's "50 Ways to Leave Your Lover".

    I think of construe:construction :: destroy:destruction. We don't have "instrue" or "instroy" vs. "instruction", but maybe we should.

  16. George said,

    December 17, 2014 @ 3:24 am

    I'm Irish and 'construe' is a perfectly ordinary word for me, 'construal' less so but far from arcane. But then again, I find the arguably skunked sense of 'to beg the question' far from arcane (to the point of actually using it in conversation), so don't take any notice of me…

  17. Michael Rank said,

    December 17, 2014 @ 7:06 am

    Confirm that I was asking about the noun (though some commenters imply it can be a verb, too???!!) construal, not the verb construe which is familiar enough.

  18. Shubert said,

    December 17, 2014 @ 8:46 am

    construct as a noun differs from construction, construal by …
    Please expound further.

  19. Shubert said,

    December 17, 2014 @ 8:58 am

    建設construct/構, 解釋 construe, interpret 傳譯, analysis 分析: A se- cret is in all of these characters.

  20. Lance said,

    December 17, 2014 @ 2:08 pm

    Leaving aside the question of "construal", the NYT has published a correction:

    An earlier version of this article incorrectly reported some results of the study. The new study estimated that the rate of under-reporting of cases was 17 percent, with a 70 percent maximum, not 70 percent. Also, the previous estimate to which it was compared meant that for every 100 known cases, there were 250 real cases, not the 350 reported based on information supplied by the lead author. (It is not the case that previous estimates assumed that up to 250 percent of cases went unreported.)

  21. J. W. Brewer said,

    December 17, 2014 @ 4:20 pm

    While U.S. lawyers/judges would probably be more likely to use "construction" as the noun related to the verb "construe" when talking specifically about a particular interpretation of a statute, contract, or similar operative text, they certainly do use "construal" as well, perhaps in slightly different contexts. So, e.g., from a Supreme Court dissent by Justice Stevens in the '90's: "Particularly given the implication that McNeil would be given favorable treatment if he told 'his side of the story' as to either or both crimes to the Milwaukee County officers, I find the Court's restricted construal of McNeil's relationship with his appointed attorney at the arraignment on the armed robbery charges to be unsupported." (Although it's still apparently unusual enough that whatever damn-you-autocorrect software is invoked by the interaction of my computer with this comment box has put a squiggly red line underneath it.)

  22. maidhc said,

    December 17, 2014 @ 5:16 pm

    Brett: A compiler can "interpret" code

    A compiler and an interpreter are two related but different things, so it would be confusing to say that a compiler interprets something.


    Compilers parse as part of the process of compilation. But no one says that a compiler would construe something. People do say that compilers understand things, but that's just a convenient phrase, not that we think they're conscious.

  23. Mark Dowson said,

    December 17, 2014 @ 6:16 pm

    Construe (and misconstrue) etc are perfectly familiar to my Brit English ear as a slightly fancy terms for "interpret" with a hint of greater formality (in the direction of "parse") – although I wouldn't use it in speech or writing. OED concurs, emphasizing the sense of grammatical analysis for "construe" (less so for "misconstrue" which is more or less synonymous with "misinterpret" or "misunderstand").

RSS feed for comments on this post