Disaggregation of data by race and ethnicity is a critical tool for shining a light on racialized systems of privilege and oppression. Despite strong ethical and practical reasons for disaggregating data along these lines, many high-value datasets lack sufficient information on race and ethnicity. In response, data scientists and researchers have developed creative statistical and analytic methods for appending race and ethnicity onto datasets that lack those data, allowing policymakers to disaggregate those data and track racial disparities to inform policymaking. In this landscape scan, we explore the literature on and consider the perspective of people working in statistical methods and data science, racial and data equity, and equitable policymaking to better understand how ethics and empathy are considered in these approaches to disaggregation. By ethical risks, we mean the ways in which imputation could harm people or increase their risk of harm. We also explore potential violations of empathy, which occur when analysts do not adequately consider the personhood or the expressed concerns and needs of people and communities reflected in data. The key takeaways from our scan include the following insights:
Imputation is a useful but imperfect tool.
Ethics best practices are underdeveloped amidst focus on technical application.
Imprecision produces disparate benefit and risk across subgroups.
Empathy is a critical but often missing element.
Based on these findings, we argue that robust consideration of ethics and empathy in imputing race and ethnicity to better understand disparities not only reduces harm, but also makes the more accurate and representative to peoples’ lived experiences.