Public administrators are committed to improving public service delivery, as evidenced by decades of accountability efforts at all levels of government. This movement is especially salient in the public education system, where student standardized test scores are increasingly used as the key performance metric to evaluate schools, teachers – and most recently – teacher preparation program (TPP) effectiveness. Evaluating TPPs using a single quantitative performance metric at the student level is a complicated endeavor. This paper illustrates a key challenge in this type of accountability system, not yet examined in the literature: graduates of individual TPPs tend to cluster in a very small number of districts. We present a case study to show how geographic stratification inhibits the ability of statistical models to disentangle the effect of district and school from TPP on student achievement, particularly in rural states.