Abstract
In most STEM instruction, students interact with visual representations, which can be presented in either in a physical or a virtual mode or in a blended form that combines both modes. While much research has compared the effects of physical and virtual representations on students’ learning, the field is far from being able to predict when and why one representation mode is more effective than the other. One reason why making such predictions is particularly difficult is that multiple different theories have been used to explain differences between representation modes. The goal of this article is twofold. First, it presents a survey of the literature to examine which theoretical perspectives have been used to motivate comparisons of representation modes and what predictions they make about their effectiveness. A review of 54 articles reveals five theoretical perspectives: physical engagement, cognitive load, haptic encoding, embodied action schemas, and conceptual salience. While the first two make general predictions about the effectiveness of representation modes, the last three make concept-specific predictions. Second, this article compares these predictions to examine how they conflict and align. This comparison identified several conflicts between theories that predict opposite effects, as well as several alignments where theories make the same predictions but based on different mechanisms. Further, this comparison revealed common confounds in experimental designs of the reviewed studies. The article concludes with recommendations for research to address the identified conflicts and with recommendations for instructors and designers of blended technologies for appropriate choices of representation modes.