Bamboozled by Bonferroni

This abstract has open access

Abstract

When many statistical hypotheses are tested simultaneously (e.g., when searching for genes associated with a disease), some statisticians recommend “correcting” classical hypothesis tests to avoid inflation of the false positive rate. I defend three theses. First, such “corrections” have no plausible evidential interpretation. Second, examples motivating the use of correction factors often encourage readers to conflate (a) conditional independence of the data given the hypotheses/parameters, with (b) unconditional independence of the hypotheses/parameters. Finally, correction factors are better construed as decision-theoretic devices that reflect the experimenter's (or the discipline's) value judgments concerning the conditions under which, after a round of testing, a hypothesis should be pursued/researched further. The standard argument that one should correct for multiple tests goes as follows. When many hypotheses are tested at a fixed significance level (e.g., 5%), there is a high chance that at least one hypothesis will be rejected, even if all hypotheses are true. Thus, a single significant result is not evidence that at least one of the hypotheses is false. Nor is the rejection of a specific hypothesis H *evidence* against H; instead, we should lower the significance level to reduce the chance of false positives. That argument, I claim, requires one to abandon at least one of two axioms about evidence: Axiom 1: If one has evidence for a hypothesis H and one deduces a trivial logical consequence H' from H, then one has evidence for H'. Axiom 2 (No evidential loss on ancillary information): If one has evidence for H, then one's evidence for H cannot be weakened by observing data whose distribution would be the same, whether H is true or not. To illustrate the first axiom, suppose Phillip Morris' CEO has evidence that smoking causes lung cancer and deduces that smoking causes *some* harm. Then the CEO comes to have evidence that smoking causes some harm. To illustrate the second, if one has evidence that one's oven is currently 350F, then one cannot lose that evidence by learning that corn prices dropped in 1972: past corn prices do not vary with one's current oven temperature. The standard argument requires one to abandon one of those two axioms. For the probabilistic calculations underlying the standard argument do not depend on whether (i) the many hypotheses being tested are evidentially related or (ii) the tests are conducted at the same or at distinct times. Giving up either axiom would require us to radically revise the importance we attribute to statistical evidence in scientific and legal settings. Giving up Axiom 1 would entail that Phillip Morris could possess evidence that smoking causes lung cancer without having evidence that smoking causes harm; we would need separate criminal statutes for every type of malady that might be caused by drugs. Giving up Axiom 2 entails that Phillip Morris could weaken its evidence for the hypothesis that smoking causes lung cancer by conducting a sufficiently large number of other, irrelevant statistical tests.

Submission ID :

PSA202299

Submission Type

Symposium

Topic 1

Probability and Statistics