Abstract
Infectious disease surveillance frequently lacks complete information on race and ethnicity, making it difficult to identify health inequities. Greater awareness of this issue has occurred due to the COVID-19 pandemic, during which inequities in cases, hospitalizations, and deaths were reported but with evidence of substantial missing demographic details. Although the problem of missing race and ethnicity data in COVID-19 cases has been well documented, neither its spatiotemporal variation nor its particular drivers have been characterized. Using individual-level data on confirmed COVID-19 cases in Massachusetts from March 2020 to February 2021, we show how missing race and ethnicity data: (1) varied over time, appearing to increase sharply during two different periods of rapid case growth; (2) differed substantially between towns, indicating a nonrandom distribution; and (3) was associated significantly with several individual- and town-level characteristics in a mixed-effects regression model, suggesting a combination of personal and infrastructural drivers of missing data that persisted despite state and federal data-collection mandates. We discuss how a variety of factors may contribute to persistent missing data but could potentially be mitigated in future contexts.