Evaluation Review, Ahead of Print.
Modern policies are commonly evaluated not with randomized experiments but with repeated measures designs like difference-in-differences (DID) and the comparative interrupted time series (CITS). The key benefit of these designs is that they control for unobserved confounders that are fixed over time. However, DID and CITS designs only result in unbiased impact estimates when the model assumptions are consistent with the data at hand. In this paper, we empirically test whether the assumptions of repeated measures designs are met in field settings. Using a within-study comparison design, we compare experimental estimates of the impact of patient-directed care on medical expenditures to non-experimental DID and CITS estimates for the same target population and outcome. Our data come from a multi-site experiment that includes participants receiving Medicaid in Arkansas, Florida, and New Jersey. We present summary measures of repeated measures bias across three states, four comparison groups, two model specifications, and two outcomes. We find that, on average, bias resulting from repeated measures designs are very close to zero (less than 0.01 standard deviations; SDs). Further, we find that comparison groups which have pre-treatment trends that are visibly parallel to the treatment group result in less bias than those with visibly divergent trends. However, CITS models that control for baseline trends produced slightly more bias and were less precise than DID models that only control for baseline means. Overall, we offer optimistic evidence in favor of repeated measures designs when randomization is not feasible.