Understanding the Oregon Medicaid experiment: Part 3

In thinking about it a bit more and reading the Supplemental Appendix, I realize that this is a logistic regression problem as much as it is an exercise in Bayesian analysis. (The Supplemental Appendix suggests that they’re using linear regression instead of logistic regression – which puzzles me.) That said, if the particulars of your analysis involve a small number of binary independent variables (i.e.., lottery winner/loser, on Medicaid/off Medicaid) and no continuous independent variable then it also seems like it would be easy to recast the logistic regression problem as a linear regression problem – need think more about that though.

In terms of analyzing the data, I’d combine logistic regression with bootstrapping to get uncertainties in estimated probabilities (and differential probabilities) of outcome with and without treatment. From there you should be able to get the ‘overlap of pdfs’ approach I described in my previous post. Although the details aren’t clear to me yet, I think this (if I follow through on it) will turn out to be complementary to Steve Pizer and Austin Frakt’s “Loss of Precision with IV” calculation.

UPDATE:

In the Supplemental Appendix the authors address their use of linear regression as opposed to logistic regression:

Because we are interested in the difference in conditional means for the treatments and controls, linear probability models would pose no concerns in the absence of covariates or in fully saturated models (Angrist 2001, Angrist and Pischke 2009 [Links added.]). Our models are not fully saturated, however, so it is possible that results could be affected by this functional form choice, especially for outcomes with very low or very high mean probability. We therefore explore the sensitivity of our results to an alternate specification using logistic regression and calculating average marginal effects for all binary outcomes, and are reassured that the results look very similar (see Table S15a-d below).

That seems reasonable in principle and the tables indicate essentially no difference in estimated effects between the two models. That would seem to favor using linear regression as a linear models are far easier to implement and analyze (IMO) than nonlinear ones.

Robust Analysis

Readings and occasional commentary

Understanding the Oregon Medicaid experiment: Part 3