## Steven D. Levitt: pwned by the base rate fallacy?

Statistics is full of terms that fool people, because they seem intuitively to mean something very simple, while in fact they mean something equally simple, but radically different. And in the rich lexicon of statistical misunderstanding, few terms are more misleading than “false positive rate”.

You take a medical test for Condition X and it comes back positive. Bad news — you have Condition X, right? Not so fast — the test is sometimes wrong. How often? Well, there’s a “false positive rate” of 10%. OK, so that means that there’s a 10% chance that your positive test result is false, and therefore a 90% chance that you have Condition X, right?

No. Wrong, wrong, wrong.

In this situation, your chances of having Condition X are probably not 9 out of 10, but more like 1 in 10 — or maybe 1 in 1,000 or 1 in 100,000 or even less. Without some additional information, we can’t tell what the odds are — but they’re almost certainly smaller than 9 in 10, and probably a very great deal smaller. Listen up, and I’ll explain.

If the truth is “no”, and the test says “yes”, that’s a “false positive”. The rate at which this happens –the proportion of the people who don’t have the condition for whom the test comes back positive — is indeed the “false positive rate.” But without a key piece of additional knowledge, this tells you about nothing about how likely a particular “yes” test result is to be correct. That key piece of missing knowledge is the “base rate”: what proportion of the general population actually has the condition being tested.

Let’s suppose the base rate is 1 in 100, and the false positive rate is 1 in 10. For simplicity, let’s assume that the false negative rate (how likely the test is to say “no” when the truth is “yes”) is also 1 in 10.

Then if we test 100,000 people, we expect that 1,000 of them have the condition, and 99,000 of them don’t (because the base rate is 1 in 100). Of the 99,000 people without the condition, 9,900 will test positive and 89,100 will test negative (10% false positive rate). Of the 1,000 people with the condition, 900 will test positive, and 100 will test negative (10% false negative rate).

Now let’s arrange these numbers in a table. The part that matters to you, as a testee whose test has come back positive, is the row colored in red, the one labeled “(TEST) YES”. How worried should you be?

 TRUTH YES NO TEST YES 900 9,900 0.083 NO 100 89,100 0.001

Well, given a 1 in 10 false positive rate and a 1 in 100 base rate, your probability of having Condition X is not 9 out of 10 — it’s 900 out of (900+9,900) = 900/10,800 = 0.083, or about 1 in 12.

(At least, that’s what the test in itself is telling us. If you have some other signs and symptoms, obviously they need to be taken into consideration as well.)

If the base rate is 1 in 1,000, with the “false positive rate” staying exactly the same at 10%, then the contingency table works out as follows. Out of 100,000 people, we expect 100 to have the condition, and 99,900 not to. Of the 99,900 unaffected people, 9,990 will test positive and 89910 will test negative. Out of the 100 affected people, 90 will test positive and 10 will test negative.

 TRUTH YES NO TEST YES 90 9,990 0.009 NO 10 89,110 0.0001

So in this case, a positive test result means that your probability of having Condition X is 90 out of (9,990+90) = 90 in 10080, or about 1 chance in 112.

OK, now read the opening of Steven D. Levitt, “Medicine and statistics don’t mix“, Freakonomics Blog, 4/9/2008:

Some friends of mine recently were trying to get pregnant with the help of a fertility treatment. At great financial expense, not to mention pain and inconvenience, six eggs were removed and fertilized. These six embryos were then subjected to Pre-Implantation Genetic Diagnosis (P.G.D.), a process which cost \$5,000 all by itself.

The results that came back from the P.G.D. were disastrous.

Four of the embryos were determined to be completely non-viable. The other two embryos were missing critical genes/D.N.A. sequences which suggested that implantation would lead either to spontaneous abortion or to a baby with terrible birth defects.

The only silver lining on this terrible result is that the latter test had a false positive rate of 10 percent, meaning that there was a one-in-ten chance that one of those two embryos might be viable.

So the lab ran the test again. Once again the results came back that the critical D.N.A. sequences were missing. The lab told my friends that failing the test twice left only a 1 in 100 chance that each of the two embryos were viable.

My friends — either because they are optimists, fools, or perhaps know a lot more about statistics than the people running the tests — decided to go ahead and spend a whole lot more money to have these almost certainly worthless embryos implanted nonetheless.

Nine months later, I am happy to report that they have a beautiful, perfectly healthy set of twins.

If that’s really an accurate description, then Levitt’s friends, their doctors, and Levitt himself are all victims of the “base rate fallacy“. His third commenter (“Justin”) understands this and points it out, but most of the rest of the commenters don’t seem to notice. Instead, they give us a lot of interesting but completely irrelevant stuff about the importance of taking chances in life, the practice of IVF, etc.

The conclusion of Levitt’s piece is:

Anyway, this is just the latest example of why I never trust statistics I get from people in the field of medicine, ever.

Biomedical researchers sometimes misuse statistics, and most doctors hardly understand the concepts at all, but I’m afraid that in this case, a better conclusion for the rest of us would be not to trust any statistical reasoning that we get from Steven D. Levitt.

It’s a bit embarrassing for Levitt to have made this elementary mistake in such a prominent place. I hope that he’ll write a follow-up, explaining what happened. But the important point is how natural this mistake is, given the terminology involved. Levitt, one of the smartest and most interesting economists around, who certainly knows a great deal about the aspects of statistics that matter to his profession, was completely pwned by a traditional but misleading piece of terminology. And 59 out of his 66 commenters (so far) keep the mistake in play.

The interpretation of medical tests is an important problem for everyone in our society today. Isn’t it time that we came up with some better terminology for talking about the probabilities involved — and taught that terminology, and the elementary mathematics involved, to middle-school children? (“Positive predictive value” is a more useful concept, at least for doctors and patients, but it’s not a terrific phrase.)

When it comes to statistical reasoning (as I’ve pointed out before), we have met the Pirahã, and they are us.

[For more discussion of the base rate fallacy in the interpretation of medical tests, see e.g. William Klein et al., “Cancer Risk Elicitation and Communication: Lessons from the Psychology of Risk Perception“, CA Cancer J Clin 57(3): 147-167, 2007, where you’ll find this example:

Consider a cancer detection test with false positive and false negative rates of 2%. Imagine further that the prevalence of this type of cancer in the population is 10 in 1,000. If asked to estimate the chances of having this type of cancer given that one has tested positive (the positive predictive value), many laypersons would offer a response of 98%. However, according to Bayes Theorem, this conditional probability must take account of the low base rate. Given that 2% of the 990 people who do not have this cancer will test positive, as will 98% of the 10 people with cancer, most individuals in this population who test positive will not have cancer, making the chances that someone with a positive test result actually has this cancer only 33%. When considering the finding that people confuse conditional probabilities (such as equating the above probability with the probability of testing positive if one has cancer, which is much higher), it is essential to be careful when discussing the meaning of test results with patients.

You could also read (among many other examinations of this problem) Jacquelyn Burkell and D. Grant Campbell, “‘What does this mean?’ How Web-based consumer health information fails to support information seeking in the pursuit of informed consent for screening test decisions“, J Med Libr Assoc, 93(3): 363-373, 2005.

Although none of this reasoning about test-interpretation involves anything beyond elementary-school mathematics, it probably makes your eyes glaze over — which is why less sophisticated people usually take test results for literal truth, and more sophisticated people often fall back on an intuitive but misleading interpretation of terms like “false positive rate”

It would take a lot of education (and re-education) to change this. It’s disappointing to see the bully pulpit of Freakonomics being used to reinforce a common and consequential misunderstanding, rather than pointing readers towards the truth. ]

[Update — Will Fitzgerald suggests that the positive predictive value of PGD is actually in the 90-95% range, based on this article. If true, this might mean that Levitt’s analysis of the probabilities is correct, and he (or his friends) just used the wrong terminology. I’m not sure, though, since the relevant aspect of the PGD diagnosis may have been different from the one discussed in the cited article.

And Wally Washington pointed out by email that I didn’t read the comments on Levitt’s post carefully enough:

Your post on the base rate fallacy was great, but more than 65 of the first 66 commenters got it right. Several explicitly acknowledged that the poster #3 Justin got it right, and several others independently cited the base rate fallacy.

OK, I skimmed the comments too quickly (and there are more of them now). Comment #9 (EmilyAnabel) does mention Type I and Type II errors, and Bayes’ Theorem, which means that she must understand what’s going on. But she doesn’t indicate that Levitt got anything wrong. Comment #17 (Cliff) agrees with #3, adding a brief but clear explanation of the issue. Comment #18 is similar, and comment #23 likewise assents. Comment #33, like #9, says sensible things that suggest correct understanding (e.g. mentioning “positive predictive value”), but doesn’t flag any error in the original post. #35 also signals a base-rate problem. #63 also seems to understand the problem, as do #66, #72, #78, #80, #81, #89 (which links to this post), and #91.

So that should have “59 out of the first 66 commenters” — I’ve corrected the number in the text above — or maybe “57 out of the first 66” if we count #9 and #33. ]

[Sridhar Ramesh wrotes:

Am I the only one who naively took the term “false positive rate” to mean not P(truth is No | test says Yes) nor P(test says Yes | truth is No) but simply P(truth is No and test says Yes). It took me several tries, embarrassingly, to understand where I was going wrong in your recent post, because it seemed clear to me that your first table had a 9.9% false positive rate (9,900 people with false positives, out of a total population of 100,000) rather than a 10% one (9,900 people with false positives, out of 99,000 people without the condition being tested for). Even upon re-reading the description “If the truth is ‘no’, and the test says ‘yes’, that’s a ‘false positive’. The rate at which this happens is indeed the ‘false positive rate.'”, I could not take it to mean anything other than what I already thought of the false positive rate as meaning.

(Apparently, in the phrase “The rate at which this happens”, the rate was not meant to be unconditional, as I took it, but, rather, conditioned on “the truth is ‘no'”; of course, there’s nothing in that description to prevent it from being conditioned on “the test says ‘yes'”, instead, as with the misconception you were speaking to.) Anyway, it just speaks once more to the ambiguities with which common-sense understanding of statistical discussion is fraught.

Indeed — I was trying to avoid bringing in the language of conditional probability, in order to make things clearer, but instead I muddied the waters. I’ve added a clause to make such confusions less likely.

The problem is that the concepts are pretty simple, and the arithmetic is even simpler, but our language in its natural state doesn’t give us any clear and unambiguous ways to talk about them. That’s why people invent jargon like “false positive” and “positive predictive value” and “specificity”. Unfortunately, some of this jargon makes perfectly good sense as ordinary language, with an interpretation that may be quite different from the technical one. In fact, as Sridhar illustrates, there may be more than one such interpretation.

It seems to me, perhaps naively, that it would help if we tried to teach the necessary terminology, concepts and skills in middle school and high school. But failing that, let’s try to avoid having big-time scientific popularizers feature wrong interpretations of the terms and concepts in widely-read articles.

The other thing which I thought was interesting was how commenter Wally Washington used “more than 65 of the first 66 commentors got it right” to apparently mean “more of the first 66 commentors got it right than would be the case if 65 of them had gotten it wrong”, rather than “All 66 commentors got it right!” or any such thing. It was probably just a composition error on his part, trying to quote the 65 number you had used while at the same speaking of people who got it right rather than people who got it wrong, but if that sort of phrasing is actually commonly used in this way among any speakers, it seems to me something of a linguistically interesting move.

]