ABSTRACT
This pilot study compares offender risk assessments conducted by human experts and advanced large language models (LLMs) within the HCR-20V3 framework. Both groups evaluated a series of synthetic forensic case vignettes designed to simulate realistic clinical conditions. Quantitative results indicate that AI models consistently assigned higher overall risk scores and demonstrated greater inter-rater reliability compared to human assessors. Qualitative analysis revealed distinct reasoning patterns: AI systems emphasized historical and static risk factors and often recommended more intensive management strategies, whereas human experts focused on recent behavioral improvements, dynamic change, and rehabilitation potential. These contrasts highlight fundamental differences between algorithmic pattern recognition and human clinical judgment. The findings suggest that integrating AI-generated analyses with professional expertise can enhance the consistency and transparency of risk evaluations, while preserving the ethical, contextual, and human-centered insights essential to forensic and clinical decision-making.