The need to virtually collaborate across distributed locations has drastically increased. Developments such as the COVID-19 pandemic and new IT platforms like the metaverse have spurred a host of new immersive social applications that are accessed through head-mounted displays. This is expected to stimulate a surge in research on extended reality–supported collaborative learning (XRCL) which refers to distributed collaboration situations where immersive technology such as head-mounted displays are used as a medium for collaborative learning. The primary aim of this article is to critically examine the potential pedagogical benefits and limitations of using XRCL with the objective of developing a theoretical framework that describes the fundamental factors that make immersive collaborative learning unique: the theory of immersive collaborative learning (TICOL). In TICOL, we propose that technological features, social affordances, and pedagogical techniques can foster four psychological factors that we define as fundamentally different in XRCL compared to collaboration that occurs through traditional systems (e.g., laptops): social presence, physical presence, body ownership, and agency. These are central factors that we hypothesize can transform the processes and contexts of collaboration through their influence on the quality of cognitive and socio-emotional social interaction, the social space, and ultimately learning outcomes. Since XRCL research is in its infancy, we hope that TICOL can provide a theoretical basis for developing the field by motivating researchers to empirically challenge and build on our hypotheses and ultimately develop a deeper understanding of if and how immersive media influences collaborative learning.