Applied Psychological Measurement, Ahead of Print.
In standalone performance assessments, researchers have explored the influence of different rating designs on the sensitivity of latent trait model indicators to different rater effects as well as the impacts of different rating designs on student achievement estimates. However, the literature provides little guidance on the degree to which different rating designs might affect rater classification accuracy (severe/lenient) and rater measurement precision in both standalone performance assessments and mixed-format assessments. Using results from an analysis of National Assessment of Educational Progress (NAEP) data, we conducted simulation studies to systematically explore the impacts of different rating designs on rater measurement precision and rater classification accuracy (severe/lenient) in mixed-format assessments. The results suggest that the complete rating design produced the highest rater classification accuracy and greatest rater measurement precision, followed by the multiple-choice (MC) + spiral link design and the MC link design. Considering that complete rating designs are not practical in most testing situations, the MC + spiral link design may be a useful choice because it balances cost and performance. We consider the implications of our findings for research and practice.