Administrative data have several advantages over questionnaire and interview data to identify cases of depression: they are usually inexpensive, available for a long period of time and are less subject to recall bias and differential classification errors. However, the validity of administrative data in the correct identification of depression has not yet been studied in general populations. The present study aimed to 1) evaluate the sensitivity and specificity of administrative cases of depression using the validated Composite International Diagnostic Interview – Short Form (CIDI-SF) as reference standard and 2) compare the known-groups validity between administrative and CIDI-SF cases of depression.
The 5487 participants seen at the last wave (2015–2018) of the PROQ cohort had CIDI-SF questionnaire data linked to hospitalization and medical reimbursement data provided by the provincial universal healthcare provider and coded using the International Classification of Disease. We analyzed the sensitivity and specificity of several case definitions of depression from this administrative data. Their association with known predictors of depression was estimated using robust Poisson regression models.
Administrative cases of depression showed high specificity (≥ 96%), low sensitivity (19–32%), and rather low agreement (Cohen’s kappa of 0.21–0.25) compared with the CIDI-SF. These results were consistent over strata of sex, age and education level and with varying case definitions. In known-groups analysis, the administrative cases of depression were comparable to that of CIDI-SF cases (RR for sex: 1.80 vs 2.03 respectively, age: 1.53 vs 1.40, education: 1.52 vs 1.28, psychological distress: 2.21 vs 2.65).
The results obtained in this large sample of a general population suggest that the dimensions of depression captured by administrative data and by the CIDI-SF are partially distinct. However, their known-groups validity in relation to risk factors for depression was similar to that of CIDI-SF cases. We suggest that neither of these data sources is superior to the other in the context of large epidemiological studies aiming to identify and quantify risk factors for depression.