The article compares three different methods to estimate effects of task characteristics and to use these estimates for model-based proficiency scaling: prediction of item difficulties from the Rasch model, the linear logistic test model (LLTM), and an LLTM including random item effects (LLTM + e). The methods are applied to empirical data from a German large-scale study of reading comprehension in English as a foreign language (N = 10,543). A priori defined task characteristics were used as predictors for item difficulty; the estimated effects were used to define thresholds between proficiency levels. The comparison of results indicates that the LLTM is too restrictive; the Rasch model and the LLTM + e yield similar results in terms of implications for scale anchoring.