Assessing Different Approaches to Leveraging Historical Smoking Exposure Data to Better Select Lung Cancer Screening Candidates: A Retrospective Validation Study

Abstract

Background

There is mounting interest in the use of risk prediction models to guide lung cancer screening. Electronic health records (EHRs) could facilitate such an approach, but smoking exposure documentation is notoriously inaccurate. While the negative impact of inaccurate EHR data on screening practices reliant on dichotomized age and smoking exposure-based criteria has been demonstrated, less is known regarding its impact on the performance of model-based screening.

Methods

Data was collected from a cohort of 37,422 ever-smokers between the ages of 55 and 74, seen at an academic safety-net healthcare system between 1999 and 2018. The National Lung Cancer Screening Trial (NLST) criteria, PLCO_M2012 and LCRAT lung cancer risk prediction models were validated against time to lung cancer diagnosis. Discrimination (area under the receiver operator curve [AUC]) and calibration were assessed. The effect of substituting the last documented smoking variables with differentially retrieved “history conscious” measures was also determined.

Results

The PLCO_M2012 and LCRAT models had AUCs of 0.71 (95% CI, 0.69–0.73) and 0.72 (95% CI, 0.70–0.74) respectively. Compared to the NLST criteria, PLCO_M2012 had a significantly greater time-dependent sensitivity (69.9% vs. 64.5%, p<0.01) and specificity (58.3% vs. 56.4%, p<0.001). Unlike the NLST criteria, the performances of the PLCO_M2012 and LCRAT models were not prone to historical variability in smoking exposure documentation.

Conclusions

Despite the inaccuracies of EHR-documented smoking histories, leveraging model-based lung cancer risk estimation may be a reasonable strategy for screening, and is of greater value compared to using NLST criteria in the same setting.

Implications

Electronic health records (EHRs) are potentially well-suited to aid in the risk-based selection of lung cancer screening candidates, but healthcare providers and systems may elect not to leverage EHR data due to prior work that has shown limitations in structured smoking exposure data quality. Our findings suggest that despite potential inaccuracies in the underlying EHR data, screening approaches that use multivariable models may perform significantly better than approaches that rely on simpler age and exposure-based criteria. These results should encourage providers to consider using pre-existing smoking exposure data with a model-based approach to guide lung cancer screening practices.

Read the full article ›