Assessment, Ahead of Print.
The hypothesis implicit in the rating scale design is that the categories reflect increasing levels of the latent variable. Rasch models for ordered polytomous items include parameters, called thresholds, that allow for empirically testing this hypothesis. Failure of the thresholds to advance monotonically with the categories (a condition that is referred to as “threshold disordering”) provides evidence that the rating scale is not functioning as intended. This work focuses on scales consisting of rather large numbers of categories, whose use is often recommended in the literature. Threshold disordering is observed in both an extended 8-point scale specially developed for the Patient Health Questionnaire-9 and the original 10-point scale of the Behavioral Religiosity Scale. The results of this work prompt practitioners not to take the functioning of the rating scale for granted, but to verify it empirically.