Psychometric modeling has become a frequently used statistical tool in research on scientific reasoning. We review psychometric modeling practices in this field, including model choice, model testing, and researchers’ inferences based on their psychometric practices. A review of 11 empirical research studies reveals that the predominant psychometric approach is Rasch modeling with a focus on itemfit statistics, applied in a way strongly similar to practices in national and international large-scale educational assessment programs. This approach is common in the educational assessment community and rooted in subtle philosophical views on measurement. However, we find that based on this approach, researchers tend to draw interpretations that are not within the inferential domain of this specific approach and not in accordance with the related practices and inferential purposes. In some of the reviewed articles, researchers put emphasis on item infit statistics for dimensionality assessment. Item infit statistics, however, cannot be regarded as a valid indicator of the dimensionality of scientific reasoning. Using simulations as illustration, we argue that this practice is limited in delivering psychological insights; in fact, various recent inferences about the structure, cognitive basis, and correlates of scientific reasoning might be unwarranted. In order to harness its full potential, we make suggestions towards adjusting psychometric modeling practices to the psychological and educational questions at hand.