Abstract
Missing data often occur in cross-sectional surveys and longitudinal and experimental studies. The purpose of this study was
to compare the prediction of self-rated health (SRH), a robust predictor of morbidity and mortality among diverse populations,
before and after imputation of the missing variable “yearly household income.” We reviewed data from 4,162 participants of
Mexican origin recruited from July 1, 2002, through December 31, 2005, and who were enrolled in a population-based cohort
study. Missing yearly income data were imputed using three different single imputation methods and one multiple imputation
under a Bayesian approach. Of 4,162 participants, 3,121 were randomly assigned to a training set (to derive the yearly income
imputation methods and develop the health-outcome prediction models) and 1,041 to a testing set (to compare the areas under
the curve (AUC) of the receiver-operating characteristic of the resulting health-outcome prediction models). The discriminatory
powers of the SRH prediction models were good (range, 69–72%) and compared to the prediction model obtained after no imputation
of missing yearly income, all other imputation methods improved the prediction of SRH (P < 0.05 for all comparisons) with the AUC for the model after multiple imputation being the highest (AUC = 0.731). Furthermore,
given that yearly income was imputed using multiple imputation, the odds of SRH as good or better increased by 11% for each
$5,000 increment in yearly income. This study showed that although imputation of missing data for a key predictor variable
can improve a risk health-outcome prediction model, further work is needed to illuminate the risk factors associated with
SRH.
to compare the prediction of self-rated health (SRH), a robust predictor of morbidity and mortality among diverse populations,
before and after imputation of the missing variable “yearly household income.” We reviewed data from 4,162 participants of
Mexican origin recruited from July 1, 2002, through December 31, 2005, and who were enrolled in a population-based cohort
study. Missing yearly income data were imputed using three different single imputation methods and one multiple imputation
under a Bayesian approach. Of 4,162 participants, 3,121 were randomly assigned to a training set (to derive the yearly income
imputation methods and develop the health-outcome prediction models) and 1,041 to a testing set (to compare the areas under
the curve (AUC) of the receiver-operating characteristic of the resulting health-outcome prediction models). The discriminatory
powers of the SRH prediction models were good (range, 69–72%) and compared to the prediction model obtained after no imputation
of missing yearly income, all other imputation methods improved the prediction of SRH (P < 0.05 for all comparisons) with the AUC for the model after multiple imputation being the highest (AUC = 0.731). Furthermore,
given that yearly income was imputed using multiple imputation, the odds of SRH as good or better increased by 11% for each
$5,000 increment in yearly income. This study showed that although imputation of missing data for a key predictor variable
can improve a risk health-outcome prediction model, further work is needed to illuminate the risk factors associated with
SRH.
- Content Type Journal Article
- DOI 10.1007/s10903-010-9415-8
- Authors
- Anthony B. Ryder, Department of Epidemiology, UT MD Anderson Cancer Center, Houston, TX 77030, USA
- Anna V. Wilkinson, Department of Epidemiology, UT MD Anderson Cancer Center, Houston, TX 77030, USA
- Michelle K. McHugh, Department of Epidemiology, UT MD Anderson Cancer Center, Houston, TX 77030, USA
- Katherine Saunders, Department of Epidemiology, UT MD Anderson Cancer Center, Houston, TX 77030, USA
- Sumesh Kachroo, Department of Epidemiology, UT MD Anderson Cancer Center, Houston, TX 77030, USA
- Anthony D’Amelio, Department of Epidemiology, UT MD Anderson Cancer Center, Houston, TX 77030, USA
- Melissa Bondy, Department of Epidemiology, UT MD Anderson Cancer Center, Houston, TX 77030, USA
- Carol J. Etzel, Department of Epidemiology, UT MD Anderson Cancer Center, Houston, TX 77030, USA
- Journal Journal of Immigrant and Minority Health
- Online ISSN 1557-1920
- Print ISSN 1557-1912