{"id":16476,"date":"2014-12-16T09:35:31","date_gmt":"2014-12-16T14:35:31","guid":{"rendered":"http:\/\/languagelog.ldc.upenn.edu\/nll\/?p=16476"},"modified":"2014-12-16T17:16:49","modified_gmt":"2014-12-16T22:16:49","slug":"x-percent-of-y-are-z","status":"publish","type":"post","link":"https:\/\/languagelog.ldc.upenn.edu\/nll\/?p=16476","title":{"rendered":"\"X percent of Y are Z\""},"content":{"rendered":"<p>It's amazing how troublesome simple percentage-talk can be. Donald McNeil Jr., \"<a href=\"http:\/\/www.nytimes.com\/2014\/12\/16\/science\/fewer-ebola-cases-go-unreported-than-thought-study-finds-.html\" target=\"_blank\">Fewer Ebola Cases Go Unreported Than Thought, Study Finds<\/a>\", NYT 12\/16\/2014<\/p>\n<p style=\"padding-left: 30px;\"><span style=\"color: #000080;\">By looking at virus samples gathered in Sierra Leone and contract-tracing data from Liberia, the scientists working on the new study estimated that about 70 percent of cases in West Africa go unreported. That is far fewer than earlier estimates, which assumed that up to 250 percent did.<\/span><\/p>\n<p><!--more--><\/p>\n<p>As stated, this seemed to me to be\u00a0impossible. It might well be true that 70 out of every 100 Ebola cases go unreported. But on that interpretation, the cited \"earlier estimate\" &#8212; that 250 out of every 100 cases might go unreported &#8212; is\u00a0logically incoherent.<\/p>\n<p>So I guess\u00a0we should interpret this to mean that (it was estimated that) for every 100 cases that are reported, 250 are not reported. This would mean that 250 out of 350 cases (71%) go unreported.\u00a0On this construal, an underreporting rate of 70% would mean that for every 100 cases that are reported, 70 are unreported, so that 70 out of 170 cases (41%) go unreported.<\/p>\n<p>Wondering what the \"new study\" really said, I looked into it a bit further. The study in question is Samuel V. Scarpino et al., \"<a href=\"http:\/\/cid.oxfordjournals.org\/content\/early\/2014\/12\/12\/cid.ciu1131.abstract\" target=\"_blank\">Epidemiological and viral genomic sequence analysis of the 2014 Ebola outbreak reveals clustered transmission<\/a>\", Clinical Infectious Diseases, 12\/15\/2014.<\/p>\n<p style=\"padding-left: 30px;\"><span style=\"color: #800000;\">Our analysis of EBOV genome sequences also provided an estimate of the proportion of cases sampled of 58% (20\u201399%). However, over 70% of confirmed patients for the period of late May to mid June in Sierra Leone were sequenced [8]. The discrepancy suggests that underreporting of cases is approximately 17%, with a maximum of 70%. [&#8230;]<\/span><\/p>\n<p style=\"padding-left: 30px;\"><span style=\"color: #800000;\">Although our estimate of underreporting has high uncertainty, our upper bound of 70% is well below the early estimate of 250% [17], suggesting that underreporting could be far less prevalent than previous estimates implied. <\/span><\/p>\n<p>In other words, they sequenced viral genomes from \"over 70%\" of confirmed patients. Their analysis, based on the distribution of genomic variants, suggested that their sample covered 58% of the variants in the overall viral population. \u00a0On this analysis, their sample would be missing about 12 cases for every\u00a070 covered, or 17 for every\u00a0100. That is, for every 100 reported cases, 17 were unreported.<\/p>\n<p>The\u00a0higher (70%) estimate comes from the lower bound of the confidence interval (20-99%), according to which they missed (70-20)\/70 = 71%. (Which turns into 70% because the original sample was bit \"over 70%\" of confirmed patients&#8230;) I'm not sure why the NYT story goes with this \"upper bound\" of 70%, rather than the central estimate of 17%.<\/p>\n<p>(Of course, if we express the 17% central estimate in terms of the way that I first tried to interpret the newpaper story, it yields a rate of 17\/117 = 14.5%. And again, the upper bound of 70% is 70\/170 = 41% in that way of thinking about it. The meaning of a simple statement about percentages is surprisingly unclear&#8230;)<\/p>\n<p>More on what Scarpino et al. did &#8212; see <a href=\"http:\/\/cid.oxfordjournals.org\/content\/early\/2014\/12\/12\/cid.ciu1131.full.pdf+html\" target=\"_blank\">the paper<\/a> and <a href=\"http:\/\/cid.oxfordjournals.org\/content\/suppl\/2014\/12\/12\/ciu1131.DC1\/ciu1131supp.pdf\" target=\"_blank\">the supplementary material<\/a> for further details:<\/p>\n<p style=\"padding-left: 30px;\"><span style=\"color: #800000;\">We fit a transmission-oriented phylodynamic model [7] to 78 EBOV genome sequences collected from over 70% of the confirmed cases arising in June of the current outbreak in Sierra Leone [8]. This model infers a time-based evolutionary reconstruction of the viral dynamics. We then used a Bayesian approach [9] on the same genomic data to reconstruct the transmission chains. We also fit a complementary Susceptible Exposed Infectious Removed (SEIR) network model that estimated clustering based on confirmed EVD cases and deaths [10, 11], inferring parameters for a clustered (\u03c6 &gt; 0) and a nonclustered population (\u03c6 = 0). Parameters of these SEIR models were fit to the cumulative numbers of laboratory-confirmed EBOV cases and laboratory-confirmed EBOV deaths obtained from the WHO Global Alert and Response news from May 27\u2013August 31 2014 (Supplementary Appendix), and the starting date for the SEIR model was sampled over the posterior distribution for the initial case supplied by our phylodynamic analysis.<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>It's amazing how troublesome simple percentage-talk can be. Donald McNeil Jr., \"Fewer Ebola Cases Go Unreported Than Thought, Study Finds\", NYT 12\/16\/2014 By looking at virus samples gathered in Sierra Leone and contract-tracing data from Liberia, the scientists working on the new study estimated that about 70 percent of cases in West Africa go unreported. [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_exactmetrics_skip_tracking":false,"_exactmetrics_sitenote_active":false,"_exactmetrics_sitenote_note":"","_exactmetrics_sitenote_category":0,"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[100],"tags":[],"class_list":["post-16476","post","type-post","status-publish","format-standard","hentry","category-language-of-science"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=\/wp\/v2\/posts\/16476","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=16476"}],"version-history":[{"count":14,"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=\/wp\/v2\/posts\/16476\/revisions"}],"predecessor-version":[{"id":16497,"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=\/wp\/v2\/posts\/16476\/revisions\/16497"}],"wp:attachment":[{"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=16476"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=16476"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=16476"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}