Recruiting a representative sample of participants is becoming increasingly difficult in large-scale health surveys. Multilevel regression and poststratification (MRP) has been shown to be effective in estimating population descriptive quantities in non-representative samples. We performed a simulation study, previously applied to an Australian population, this time to a US population, to assess MRP performance.
Data were extracted from the 2017 Current Population Survey representing a population of US adult males aged 18–55 years. Simulated datasets of non-representative samples were generated. State-level prevalence estimates for a dichotomous outcome using MRP were compared with the use of sampling weights (with and without raking adjustment). We also investigated the impact on MRP performance of sample size, model misspecification, interactions and the addition of a geographic-level covariate.
MRP was found to achieve generally superior performance, with large gains in precision vastly outweighing the increased accuracy observed for sampling weights with raking adjustment. MRP estimates were generally robust to model misspecification. We found a tendency of MRP to over-pool between-state variation in the outcome, particularly for the least populous states and small sample sizes. The inclusion of a state-level covariate appeared to mitigate this and further improve MRP performance.
MRP has been shown to be effective in estimating population descriptive quantities in two different populations. This provides promising evidence for the general applicability of MRP to populations with different geographic structures. MRP appears to be a valuable analytic strategy for addressing potential participation bias from large-scale health surveys.