Abstract
Background and Aims
A substantial share of fatal drug overdoses is missing information on specific drug involvement, leading to under‐reporting of opioid‐related death rates and a misrepresentation of the extent of the opioid epidemic. We aimed to compare methodological approaches to predicting opioid involvement in unclassified drug overdoses in US death records and to estimate the number of fatal opioid overdoses from 1999 to 2016 using the best‐performing method.
Design
This was a secondary data analysis of the universe of drug overdoses in 1999–2016 obtained from the National Center for Health Statistics Detailed Multiple Cause of Death records.
Setting
United States.
Cases
A total of 632 331 drug overdose decedents. Drug overdoses with known drug classification comprised 78.2% of the cases (n = 494 316) and unclassified drug overdoses (ICD‐10 T50.9) comprised 21.8% (n = 138 015).
Measurements
Known opioid involvement was defined using ICD‐10 codes T40.0–40.4 and T40.6, recorded in the set of contributing causes. Opioid involvement in unclassified drug overdoses was predicted using multiple methodological approaches: logistic regression and machine learning techniques, inclusion/exclusion of contributing causes of death and inclusion/exclusion of county‐level characteristics. Having selected the model with the highest predictive ability, we calculated corrected estimates of opioid‐related mortality.
Findings
Logistic regression and random forest models performed similarly. Including contributing causes substantially improved predictive accuracy, while including county characteristics did not. Using a superior prediction model, we found that 71.8% of unclassified drug overdoses in 1999–2016 involved opioids, translating into 99 160 additional opioid‐related deaths, or approximately 28% more than reported. Importantly, there was a striking geographic variation in undercounting of opioid overdoses.
Conclusions
In modeling opioid involvement in unclassified drug overdoses, highest predictive accuracy is achieved using a statistical model—either logistic regression or a random forest ensemble—with decedent characteristics and contributing causes of death as predictors.