Error Control and Severity

This submission has open access
Submission Summary
I put forward a general principle for evidence: an error-prone claim C is warranted to the extent it has been subjected to, and passes, an analysis that very probably would have found evidence of flaws in C just if they are present. This probability is the severity with which C has passed the test. When a test’s error probabilities quantify the capacity of tests to probe errors in C, I argue, they can be used to assess what has been learned from the data about C. A claim can be probable or even known to be true, yet poorly probed by the data and model at hand. The severe testing account leads to a reformulation of statistical significance tests: Moving away from a binary interpretation, we test several discrepancies from any reference hypothesis and report those well or poorly warranted. A probative test will generally involve combining several subsidiary tests, deliberately designed to unearth different flaws. The approach relates to confidence interval estimation, but, like confidence distributions (CD) (Thornton), a series of different confidence levels is considered. A 95% confidence interval method, say using the mean M of a random sample to estimate the population mean μ of a Normal distribution, will cover the true, but unknown, value of μ 95% of the time in a hypothetical series of applications. However, we cannot take .95 as the probability that a particular interval estimate (a ≤ μ ≤ b) is correct—at least not without a prior probability to μ. In the severity interpretation I propose, we can nevertheless give an inferential construal post-data, while still regarding μ as fixed. For example, there is good evidence μ ≥ a (the lower estimation limit) because if μ < a, then with high probability .95 (or .975 if viewed as one-sided) we would have observed a smaller value of M than we did. Likewise for inferring μ ≤ b. To understand a method’s capability to probe flaws in the case at hand, we cannot just consider the observed data, unlike in strict Bayesian accounts. We need to consider what the method would have inferred if other data had been observed. For each point μ’ in the interval, we assess how severely the claim μ > μ’ has been probed. I apply the severity account to the problems discussed by earlier speakers in our session. The problem with multiple testing (and selective reporting) when attempting to distinguish genuine effects from noise, is not merely that it would, if regularly applied, lead to inferences that were often wrong. Rather, it renders the method incapable, or practically so, of probing the relevant mistaken inference in the case at hand. In other cases, by contrast, (e.g., DNA matching) the searching can increase the test’s probative capacity. In this way the severe testing account can explain competing intuitions about multiplicity and data-dredging, while blocking inferences based on problematic data-dredging.
Submission ID :
PSA2022102
Submission Type
Submission Topic
speaker
,
Virginia Tech

Similar Abstracts by Type

Submission ID
Submission Title
Submission Topic
Submission Type
Primary Author
PSA2022227
Philosophy of Climate Science
Symposium
Prof. Michael Weisberg
PSA2022211
Philosophy of Physics - space and time
Symposium
Helen Meskhidze
PSA2022165
Philosophy of Physics - general / other
Symposium
Prof. Jill North
PSA2022218
Philosophy of Social Science
Symposium
Dr. Mikio Akagi
PSA2022263
Values in Science
Symposium
Dr. Kevin Elliott
PSA202234
Philosophy of Biology - general / other
Symposium
Mr. Charles Beasley
PSA20226
Philosophy of Psychology
Symposium
Ms. Sophia Crüwell
PSA2022216
Measurement
Symposium
Zee Perry
126 visits