Abstract
Researchers studying posttraumatic stress disorder (PTSD) often use diagnostic codes within electronic medical records (EMRs) to identify individuals with the disorder. This study evaluated the performance of algorithms for defining PTSD based on International Classification of Diseases (ICD) code use within EMR data. We used data from a registry of U.S. veterans for whom both structured interview data and Veterans Health Administration EMR data were available. Using interview-diagnosed PTSD as the reference criterion, we calculated diagnostic accuracy statistics for algorithms that required the presence of at least one and up to seven encounters in which a PTSD diagnosis was present in EMR data within any clinical source, mental health clinic, or specialty PTSD clinic. We evaluated algorithm accuracy in the total sample (N = 1,343; 64.1% with PTSD), within a subsample constrained to lower PTSD prevalence (n = 712; 32.3% with PTSD), and as a function of demographic characteristics. Algorithm accuracy was influenced by PTSD prevalence. Results indicated that higher thresholds for the operationalization of PTSD may be justified among samples in which PTSD prevalence is lower. Requiring three PTSD diagnoses from a mental health clinic or four diagnoses from any clinical source may be a suitable minimum standard for identifying individuals with PTSD in EMRs; however, accuracy may be optimized by requiring additional diagnoses. The performance of many algorithms differed as a function of educational attainment and age, suggesting that samples of individuals with PTSD developed based on EMR ICD codes may skew toward including older, less-educated veterans.