Background and Aims
Participating in online gambling is associated with an increased risk for experiencing gambling-related harms, driving calls for more effective, personalized harm prevention initiatives. Such initiatives depend on the development of models capable of detecting at-risk online gamblers. We aimed to determine whether machine learning algorithms can use site data to detect retrospectively at-risk online gamblers indicated by the Problem Gambling Severity Index (PGSI).
Exploratory comparison of six prominent supervised machine learning methods (decision trees, random forests, K-nearest neighbours, logistic regressions, artificial neural networks and support vector machines) to predict problem gambling risk levels reported on the PGSI.
Lotoquebec.com (formerly espacejeux.com), an online gambling platform operated by Loto-Québec (a provincial Crown Corporation) in Quebec, Canada.
N = 9145 adults (18+) who completed the survey measure and placed at least one bet using real money on the site.
Participants completed the PGSI, a self-report questionnaire with validated cut-offs denoting a moderate-to-high-risk (PGSI 5+) or high-risk (PGSI 8+) for experiencing past-year gambling-related problems. Participants agreed to release additional data about the preceding 12 months from their user accounts. Predictor variables (144) were derived from users’ transactions, apparent betting behaviours, listed demographics and use of responsible gambling tools on the platform.
Our best classification models (random forests) for the PGSI 5+ and 8+ outcome variables accounted for 84.33% (95% CI = 82.24–86.41) and 82.52% (95% CI = 79.96–85.08) of the total area under their receiver operating characteristic curves, respectively. The most important factors in these models included the frequency and variability of participants’ betting behaviour and repeat engagement on the site.
Machine learning algorithms appear to be able to classify at-risk online gamblers using data generated from their use of online gambling platforms. They may enable personalized harm prevention initiatives, but are constrained by trade-offs between their sensitivity and precision.