School-level achievement results drive high-stakes decisions but are often a reflection of the students a school serves, rather than the quality of the school itself. In this report, we assess whether publicly available data on school test scores and student characteristics can be used to generate high-quality measures of schools’ effects on student achievement. We find that adjusting for student demographics makes test score data a better indicator of school quality than using raw scores, but considerable bias remains.
Adjusting Performance Data
To assess methods for adjusting outcomes data, we first establish a “north star” measure of school quality. We use student-level administrative data from North Carolina to construct benchmark school value-added measures, drawing on best practices from the applied econometrics literature. These data contain the universe of students attending public schools in North Carolina and include information on annual test performance and demographics, including race or ethnicity, gender, special education status, socioeconomic status, and English language learner status. We then test the value-added measures to see how well they predict student success.
Because student-level longitudinal data systems are not publicly available, we propose an innovative method of adjusting publicly available data and, using the North Carolina data, mimic the usual constraints of publicly available data to compare the adjusted measures against our north-star estimates.
Next, we use publicly available EDFacts data to apply our adjustment to schools nationwide. Our results show that demographic adjustments to public data are an improvement over unadjusted proficiency rates and are a marginal improvement over traditional regression methods, but considerable bias and noise remain.
The code for this adjustment is available here.
Using Caution with School Quality Measures
Accurately measuring school quality data is difficult. In many cases, schools are not comparable because of pervasive and enduring segregation based on race, ethnicity, and socioeconomic status. The results of this study imply that one must use caution when using these measures to make decisions that affect real-world decisions, such as families’ choices of or policymakers’ actions toward individual schools. We conclude that school quality measures based on aggregate data should not be used to make high-stakes decisions about individual schools, such as evaluating the principal’s performance. But they can still be useful for understanding performance levels or trends in groups of schools.