Abstract
Clinical evidence based on real-world data (RWD) is accumulating exponentially providing larger sample sizes available, which demand novel methods to deal with the enhanced heterogeneity of the data. Here, we used RWD to assess the prediction of cognitive decline in a large heterogeneous sample of participants being enrolled with cognitive stimulation, a phenomenon that is of great interest to clinicians but that is riddled with difficulties and limitations. More precisely, from a multitude of neuropsychological Training Materials (TMs), we asked whether was possible to accurately predict an individual’s cognitive decline one year after being tested. In particular, we performed longitudinal modelling of the scores obtained from 215 different tests, grouped into 29 cognitive domains, a total of 124,610 instances from 7902 participants (40% male, 46% female, 14% not indicated), each performing an average of 16 tests. Employing a machine learning approach based on ROC analysis and cross-validation techniques to overcome overfitting, we show that different TMs belonging to several cognitive domains can accurately predict cognitive decline, while other domains perform poorly, suggesting that the ability to predict decline one year later is not specific to any particular domain, but is rather widely distributed across domains. Moreover, when addressing the same problem between individuals with a common diagnosed label, we found that some domains had more accurate classification for conditions such as Parkinson’s disease and Down syndrome, whereas they are less accurate for Alzheimer’s disease or multiple sclerosis. Future research should combine similar approaches to ours with standard neuropsychological measurements to enhance interpretability and the possibility of generalizing across different cohorts.