ABSTRACT
For the SIPP-118, a widely used instrument for measuring the severity of personality disorders in 16 facets and five domains, T-scores, and percentile rank scores were established. Various approaches based on classical test theory and item response theory (IRT) for establishing T-scores were compared, which are assumed to achieve in-creasing levels of accuracy. Three approaches were evaluated: (1) a simple linear conversion of raw scores to T-scores, (2) a normalizing conversion (Rankit), and (3) an approach based on IRT. We compared T-scores resulting from these approaches with IRT-based factor scores. The findings show that the linear approach produced distorted T-scores for many facets of the SIPP-118, especially in the lower, more pathological range of scores. The Rankit and IRT-based approaches yielded in practice almost identical T-scores and both corresponded quite well with factor scores that were actually based on an IRT model for these facets or domain scores. Implications for the practice of establishing T-scores are discussed. IRT provided the most accurate trait estimates, but it requires a complex calculation that takes into account item parameters and the individual’s response pattern. Regression-based IRT-score approximations and Rankit-based T-scores yielded adequate estimates as well.