Applied Psychological Measurement, Ahead of Print.
Survey scores are often the basis for understanding how individuals grow psychologically and socio-emotionally. A known problem with many surveys is that the items are all “easy”—that is, individuals tend to use only the top one or two response categories on the Likert scale. Such an issue could be especially problematic, and lead to ceiling effects, when the same survey is administered repeatedly over time. In this study, we conduct simulation and empirical studies to (a) quantify the impact of these ceiling effects on growth estimates when using typical scoring approaches like sum scores and unidimensional item response theory (IRT) models and (b) examine whether approaches to survey design and scoring, including employing various longitudinal multidimensional IRT (MIRT) models, can mitigate any bias in growth estimates. We show that bias is substantial when using typical scoring approaches and that, while lengthening the survey helps somewhat, using a longitudinal MIRT model with plausible values scoring all but alleviates the issue. Results have implications for scoring surveys in growth studies going forward, as well as understanding how Likert item ceiling effects may be contributing to replication failures.