Understanding the Oregon Medicaid experiment: Part 4

UPDATED 5/25/2013

Part 4 – in which I believe I make progress in understanding the details but still end up wrapped around the axle.

First off, but having nothing directly to do with the NEJM paper, I understand significantly more about logistic regression than I did a week ago. I’m used to thinking about continuous variables. Logistic regression facilitates modeling of binary outcomes, i.e., enables you to predict the probability of an outcome being true (or false) given a particular set of conditions which affect the outcome.

Under the logit model, the probability of an outcome being true given a set of conditions defined as vector $\mathbf{x}$ is:

(1) $\begin{equation*} p(\mathbf{x}) = \frac{1}{ 1+exp(- \mathbf{x}^{T} \mathbf{b} )} \end{equation*}$

The vector $\mathbf{x}$ consists of m elements whose values are the independent variables which affect the outcome. For example, in the Oregon Medicaid experiment, p could be the probability of elevated GH level in an individual and the elements of $\mathbf{x}$ could be ones or zeros depending upon whether the individual had Medicaid, was a member of a one-person household, a two-person househol, liked dogs, stated that pancakes were their favorite breakfast food, etc. (The last two are intentionally silly – I made them up – but you get the idea, $\mathbf{x}$ is a vector which describes the “state” of the individual.) The vector $\mathbf{b}$ consists of the sensitivities of the probability to the independent variables in $\mathbf{x}$ . Our goal is to determine the ‘best’ values of the fit coefficients and the accuracy of the logistic regression model.

The maximum likelihood (ML) values of the fit coefficients are defined as those which maximize the log-likelihood function:

(2) $\begin{equation*} L(\mathbf{b}) = \sum_{i=1}^{n}{ \{ y_{i} \cdot ln[p(\mathbf{x_i ; \mathbf{b}})] + (1-y_{i}) \cdot ln[1 - p(\mathbf{x_i ; \mathbf{b}}) ] } \} \end{equation*}$

where n is the number of observations and $y_{i}$ is the outcome of the i-th observation, $y_{i} = 1$ when the event occurs and $y_{i} = 0$ when it doesn’t. More specifically, we determine the ML values of the fit coefficients by solving the system of equations which results from

(3) $\begin{equation*} \frac{ \partial L }{ \partial \mathbf{b} } = \mathbf{0} \end{equation*}$

Finding the ML values is an iterative process. Standard algorithms include Gauss-Newton and Levenberg-Marquardt. The solution turns out to be very straightforward. Define $\mathbf{X}$ to be an n x m matrix whose rows are the individual $\mathbf{x}$ vectors defined above then

(4) $\begin{equation*} \mathbf{b}^{(j+1)}= \mathbf{b}^{(j)} + (\mathbf{X} ^{T} \mathbf{W}^{(j)} \mathbf{X})^{-1} \mathbf{X} ^{T} \mathbf{W}^{(j)}(\mathbf{y}-\mathbf{p}^{(j)}) \end{equation*}$

where j denotes the iteration number, $\mathbf{W}^{(j)}$ is a diagonal matrix of weights which is updated with each iteration, and $\mathbf{p}^{(j)}$ is the vector of probabilities calculated by exercising the logit model for $\mathbf{b}^{(j)}$ . The ii-th element of $\mathbf{W}$ is:

(5) $\begin{equation*} (\mathbf{W}^{(j)})_{ii}= p( \mathbf{x}_i ; \mathbf{b}^{(j)} ) \cdot ( 1 - p( \mathbf{x}_i ; \mathbf{b}^{(j)} ) ) \end{equation*}$

(Think about the values of the weighting elements for a moment. The model parameters, $\mathbf{b}$ , tell you about the transition from low probability to high. Eq.(5) tells you that when the model predicts $p( \mathbf{x}_{i} ) \approx 1$ or $p( \mathbf{x}_{i} ) \approx 0$ then that point doesn’t provide much information about the transition and has a correspondingly low impact on the estimated model parameter values. The fit coefficients are most sensitive to points associated with mid-range probability, i.e., $p \approx 0.5$ .)

For a modest number of parameters such as the case in the NEJM paper, Eq.(4) is very simple to evaluate. The most challenging aspect of the calculation is inverting an m x m matrix (m = no. of fit parameters). In the (seemingly to me) unlikely event that $\mathbf{X} ^{T} \mathbf{W} \mathbf{X}$ is ill-conditioned you’d need to implement a workaround but overall it looks to me that solving for the ML fit coefficients in the logit is pretty simple. The covariance of the model parameter values may be calculated from the Fisher information matrix. (You’re on your own with that but see here for the process if you’re not satisfied with what Google turns up.)

All that said, ML coefficient values are much simpler to determine when the model is linear, i.e.,

(6) $\begin{equation*} \mathbf{y} = \mathbf{X b} \end{equation*}$

The maximum likelihood estimate of the fit coefficients is:

(7) $\begin{equation*} \hat{ \mathbf{b} } = (\mathbf{X} ^{T} \mathbf{W} \mathbf{X})^{-1} \mathbf{X} ^{T} \mathbf{W} \mathbf{y} \end{equation*}$

The covariance of the regression coefficients is:

(8) $\begin{equation*} cov(\mathbf{b}) = (\mathbf{X} ^{T} \mathbf{W} \mathbf{X})^{-1} \end{equation*}$

Actually, for the weight function given by Eq.(5) you’d need to solve Eqs.(6)-(7) iteratively, i.e., use iteratively reweighted least squares, and then evaluate Eq.(8). Anyhow, the math is pretty straightforward. Where I get wrapped around the axle is that I can’t get my head around “instrumental variables” and doing so is central to understanding Baicker and colleagues analysis of the data itself – as well as Pizer and Frakt’s statistical power analysis.

There are a couple other things to note:

Implementing of a model order selection algorithm to assess which covariates are statistically significant would be instructive
With model order selection in mind, the Oregon data seems like it’s well-suited to analysis using LASSO.

More to come? Not sure. I may just try to educate myself on instrumental variables and then just call it a day.

UPDATE: There was a bigger point to my logistic regression overview above. First off, what I’d like to do is to create logistic regression model for probability of Type 2 diabetes as a function of body chemistry, obesity, etc. (I offer that last sentence with the caveat that I have no idea of whether that’s been tried before and, if so, how successful it was. I’d be surprised if it hasn’t been done – or at least attempted – before.) Back in my initial post, I went on about what you really want to see is a correlation plot of quantities such as glycated hemoglobin (GH) pre-treatment and post-treatment. Now suppose you could do that for all the variables which went into your logistic regression equation for Type 2 diabetes. It’s important that the model incorporate endogenous variables because (presumably) outcomes depend upon starting point. Treatment could significantly reduce the amount of Chemical A (associated with a bad outcome) in the patient’s bloodstream but the reduction might not result in a dramatic reduction in the likelihood of a bad outcome. (With that in mind, we’d also need to consider the context of the after-treatment measurement. Would the patient continue to improve with additional treatment? Did their condition improve as much as it would ever improve.) For example, if a patient starts out with a ridiculously high GH level and the treatment they receive reduces it to a moderately high GH level is that a meaningful improvement from a public health standpoint. If the patient had a 98% chance of developing Type 2 diabetes pre-treatment and treatment lowered it to 92% would you want to count that as a significant benefit? Would it count as a significant benefit if continued treatment would drop the probability by 1% per month for the foreseeable future? What if the initial treatment lowered the probability from 11% to 5%? Continue treatment? Declare victory and go home? The details matter. Back to logistic regression: If you had patient outcomes as well as variable values pre- and post-treatment then, in principle, you could compute the effect that offering Medicaid would have on the occurrence of Type 2 diabetes in the population-at-large. In the absence of before and after measurements predictions of effectiveness are problematic.

So, in conclusion, it seems to me that Austin Frakt and Aaron Carroll got things right from the get-go:

From our full reading of the paper, let us add the following to the conversation:

1) Improvements in mental health are still improvements in health outcomes…

2) Non-statistical significance does not mean failure…

3) What is reasonable to expect? How much does private insurance affect these values? Do we know? No…

4) Financial hardship matters. Here Medicaid shined…

5) Preventive care matters…

6) Health insurance is necessary, but not sufficient to improve health. It’s just the first step…

7) Most of these measures are still process measures. A1C is a marker. So is cholesterol. Did real outcomes change? Patient centered ones, like health related quality of life, did. Did mortality? Did morbidity? We still don’t know. That would take more time to see.

So chill, people. This is another piece of evidence. It shows that some things improved for people who got Medicaid. For others, changes weren’t statistically significant, which isn’t the same thing as certainty of no effect. For still others, the jury is still out. But it didn’t show that Medicaid harms people, or that the ACA is a failure, or that anything supporters of Medicaid have said is a lie. Moreover, it certainly didn’t show that private insurance or Medicare succeeds in ways that Medicaid fails.

People claiming otherwise need to go read the study and rebut these points.

Robust Analysis

Readings and occasional commentary

Understanding the Oregon Medicaid experiment: Part 4