And Rubrick, the interpretation of a 5% p-value given by Eric is correct. There may be many other interpretations that would be more important in the sense that could be more useful, but they are incorrect.

]]>I'm not sure that's strictly true either. It's not that we have a 5% chance of knowing no more than we did before the test was run. We know more, but part of what we know is that there is a chance that our conclusion is wrong (and this chance is not necessarily 5%, it could be higher or lower).

We would only know *no more* than we did before the test was run if there really was no correlation whatsoever between reality and the outcome of the test. If we were, for example, to try to "test" a particular hypothesis by tossing a coin and reading heads as "yes" and tails as "no" we would actually get a "correct" result half of the time, but such a test would clearly tell us no more information than we had before the test was run, because a false positive is exactly as probable as a true positive.

]]>"How much larger and more expensive would the experiment need to be for it to have much hope of meeting a more stringent p-value?"

I suspect this is a big factor; requiring a more stringent p-value in, say, behavioral psychology would seem to spell the end of "Ask 20 grad students and write it up". I'm not entirely sure this would be a bad thing, given the frequency with which flimsy results are A) trumpeted in the media, and B) built upon without giving enough thought to whether they might be entirely bogus.

Your point about the meaning of the 5% value is important, but in terms of the weight one should give to the results of a study, I think the important interpretation is that there's a 5% chance that we don't actually know any more than if the study hadn't been run at all.

]]>No, there is no overriding argument against making the required p-value smaller. The required p-value is (in principle) chosen by the experimenters and it may take several factors into account. These may include: How inherently unlikely is the effect tested for generally perceived to be? How much larger and more expensive would the experiment need to be for it to have much hope of meeting a more stringent p-value? How serious would the consequences be of getting a false positive? And, not least, what is the tradition in the field? In most life sciences, 5% is quite usual. In controversial fields like parapsychology, 1% is more usual (reflecting a general perception that the effects tested for are inherently unlikely). In particle physics, the standard is 5 sigma, ie p=0.0000005 approximately, reflecting (amongst other things) the relative ease of meeting such a stringent p-value in that field.

Incidentally, passing a statistical test at the 5% level does not mean that there is a 5% probability that the effect is not real. It means that, if the effect is not real, there was a 5% probability of passing the statistical test. This point is often misunderstood but it is important.

]]>The real problem is how to bring forward research with more substantial results, not merely crossing the threshold of statistical significance (and maybe not crossing it at all), but something that has a potential to really improve our understanding of the things.

A propos, Mr. Edgeworth apparently lost √ π in his equation.

P.S. My old pen name, D.O., is apparently consigned to spam bin.

]]>The researchers say, “We found no significant effect of Medicaid coverage…”, and “We observed no significant effect on…”. In both those cases the full (untruncated) statement is true, with the reasonable interpretation of “significant” as “statistically significant”. But the researchers go on to say, “This randomized, controlled study showed that Medicaid coverage generated no significant improvements…” Kevin Drum remarks, “It's fine for the authors of the study to describe it that way”, but it's not fine at all. The full (untruncated) statement is false, and no amount of hiding behind statistical language makes it true. When an experiment is done and the result is not statistically significant, it does not *show* anything.

Some discussion of this and pointers to rhe literature can be found at the post "Fetishizing p-Values".

[(myl) I didn't leave it out — I referred you to a list of previous posts where it's discussed at length, e.g. here ("statistical significance without a loss function"), here ("my objection was never about statistical significance, but rather about effect sizes and practical significance"), here (difference between statistical and clinical significance in drug studies), or here ("There's a special place in purgatory reserved for scientists who make bold claims based on tiny [though statistically significant] effects of uncertain origin").

But in this case, the critical issue is neither the p values nor the loss functions nor the effect sizes, but the details of the experiment.]

]]>