[UPDATE 3/16/2013: I’ve revised this post multiple times over the past ~12 hours and will likely continue to do so over the next few days.]
There’s a lively discussion over at Andrew Gelman’s blog following his post, Misunderstanding the p-value. (No, it has nothing to do with urine tests.) The Wikipedia definition is actually spot on: “In statistical hypothesis testing the p-value is the probability of obtaining a test statistic at least as extreme as the one that was actually observed, assuming that the null hypothesis is true.” The way to describe it formally (and a poster on Gelman’s blog did) is , where t is the calculated test statistic, T is the user-specified critical value of the test statistic, and
means
conditional on the null hypothesis,
, being true. (If you fed your detector a continuous stream of data where
was true then the p-value would be your Type I error rate – or vice versa, depending upon how you prefer to look at it.)
The thing which has made the discussion on Gelman’s blog lively is that not all concur with that definition. Specifically, some object to calling the p-value a conditional probability. See this post on Larry Wasserman’s blog for details on the objection. He writes:
When I teach Bayes, I do write the model as
. When I teach frequentist statistics, I write this either as
or
. There is no conditioning going on. To condition on
would require a joint distribution for
. There is no such joint distribution in frequentist-land.
While I see his point in terms of formal definitions, the argument doesn’t strike me as constructive. In practice, is that anything more than a semantic distinction? He continues:
I understand that people often say “conditional on
” to mean “treating
as fixed.”
The last statement strikes me as a non sequitur. “Conditional on ” means “conditional on
being true”. I don’t see how defining it to mean “treating
as fixed” gets you anywhere useful.
There are Bayesian methods. There are Frequentist methods. The point is to answer questions and solve problems. Use the tools available to you. Being correct and consistent with terminology when you apply those tools is important because you want to avoid confusing people who are trying to understand what you’re doing, but getting hung up on terminology and formal definitions to the point that you’re spending the bulk of your time arguing over those issues rather than solving problems is counterproductive.
Anyhow, one of the other people who objected to calling p-values conditional probabilities was mathematician and philosopher philosopher of science Deborah Mayo. Her blog, Error Statistics Philosophy, is a good read if you’re interested in the philosophy of statistics and/or estimation theory. I take her views very seriously. In this case however I disagree strongly with her view. Skipping over the substance and cutting to the conclusion, she wrote:
Thanks to Larry [Wasserman] and Konrad (maybe others, didn’t read them all) on p-values not being conditional probabilities.
To which I responded – and I’m posting my response here because I think it was pretty good:
How is a p-value anything but a conditional probability? Perhaps it is not a “Conditional Probability” but it is most certainly a conditional probability. I make that statement without a deep understanding of underlying theory of statistics but with a lot of experience making decisions based on fitting models to data and being able to forecast error rates accurately based on analysis of fit results.
Here’s a typical example: I have two signal hypotheses, H0 (null) and H1 (signal of interest present). For each measurement I must decide H1 or ~H1. (~H1 is probably H0 but it could be something else. The details of ~H1 don’t matter.) Each hypotheses has an associated signal model. I fit each model to each n-dimensional observation. I calculate the sum-of-squared-residuals for each fit, RSS0 and RSS1, respectively. I calculate an F-value based on the number of regressors (fit parameters) in each model and the dimensionality of the data. With the F-value and RSS1 in hand I can make the decision.
Deciding H1 or ~H1 is a two step process. Step 1 = reject or not reject H0 based on the F-value. (If I reject H0 then I move on to Step 2, if not then I decide ~H1 and I’m done.) How do a make that decision? I determine the F-value corresponding to p_crit; p_crit could be 0.05 or higher or lower. Where does that F-value, F_crit, come from? Two options: Option 1 is that I calculate F_crit from first principles based on the presumption that H0 is true, i.e., I take it from an F-distribution with the appropriate numbers of degrees of freedom and a specified value of p_crit. Option 2 is that I look at the actual distribution of F-values for samples where I know that H0 is true and I determine F_crit empirically. Either way, I’m determining the threshold F-value for rejecting H0 from distributions where H0 is presumed to be true.
Moving on… Suppose the F-value for the particular measurement exceeds F_crit? If so then I reject H0 but I don’t necessarily accept H1. Step 2: Having ruled out H0 based on the F-value then I decide H1 or ~H1 based on RSS1. For normally-distributed measurement noise RSS1 will be chi-squared-distributed. (If noise isn’t normally-distributed then I either come up with a different model pdf for RSS1 based on a more appropriate noise model or I determine the pdf empirically based on prior observations.) I decide H1 or ~H1 based on the threshold RSS value which follows from my chosen p_crit and the number of degrees of freedom of the chi-squared distribution. Again, I’m choosing a threshold value which is conditional on the hypothesis being true. In practice, all the thresholds I set are determined from pdfs conditional on particular signal hypotheses being true.
So I repeat my original question: How is a p-value anything but a conditional probability?
PS For what it’s worth, the decision approach above is well-established – see, e.g., Louis Scharf’s work on Matched Subspace Detectors and Adaptive Subspace Detectors.
[End of my response to D. Mayo.]
I’ll update this post at a later date to provide a more thorough explanation as to why I believe p-values are appropriately regarded as conditional probabilities.
There is nothing at all wrong with calling it a conditional probability. You wont run into any difficulty, either practical or theoretical, by thinking of it as a conditional probability.
Frequentists don’t object to Bayes Theorem in general, since it is an elementary consequence of the product rule of probability. They insist thought, that it only sometimes applies and that you can only sometimes invert P(x|h) to get P(h|x). There is nothing in the mathematics which requires this restriction; it’s purely a philosophical consequence of the Frequentist interpretation of probabilities. Because of this they want to avoid calling P(x|h) as a way of stressing that it’s not legitimate in their view to invert it and P(h|x).
Also, Dr. Mayo is not a Mathematician or Statistician as she will readily admit. She’s a Philosopher of Science who specializes in the Philosophy of Statistics.
That’s an excellent explanation. Thanks also for the clarification on Prof. Mayo.