Abstract
Objectives
Few interactions between risk factors for schizophrenia have been replicated, but fitting all such interactions is difficult due to high‐dimensionality. Our aims are to examine significant main and interaction effects for schizophrenia and the performance of our approach using simulated data.
Methods
We apply the machine learning technique elastic net to a high‐dimensional logistic regression model to produce a sparse set of predictors, and then assess the significance of odds ratios (OR) with Bonferroni‐corrected p ‐values and confidence intervals (CI). We introduce a simulation model that resembles a Finnish nested case–control study of schizophrenia which uses national registers to identify cases (n = 1,468) and controls (n = 2,975). The predictors include nine sociodemographic factors and all interactions (31 predictors).
Results
In the simulation, interactions with OR = 3 and prevalence = 4% were identified with <5% false positive rate and ≥80% power. None of the studied interactions were significantly associated with schizophrenia, but main effects of parental psychosis (OR = 5.2, CI 2.9–9.7; p < .001), urbanicity (1.3, 1.1–1.7; p = .001), and paternal age ≥35 (1.3, 1.004–1.6; p = .04) were significant.
Conclusions
We have provided an analytic pipeline for data‐driven identification of main and interaction effects in case–control data. We identified highly replicated main effects for schizophrenia, but no interactions.