Development and Validation of an Explainable Machine Learning-Based Model for Predicting the Interval Growth of Pulmonary Subsolid Nodules: A Prospective Multicenter Cohort Study

Published: 29 Jun 2024, Last Modified: 03 Jul 2024KDD-AIDSH 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Subsolid nodules, Natural course, Lung adenocarcinoma, Radiomics, Machine learning
TL;DR: The interpretable machine learning model we developed based on multicenter longitudinal follow-up data for SSN has been successfully developed to accurately predict changes in SSN
Abstract: Abstract Objectives: In this multicenter study, we aimed to develop and validate a predictive model for pulmonary subsolid nodules (SSN) growth at different time intervals by machine learning (ML) based CT radiomics methods. This model is intended to guide personalized follow-up strategies in clinical practice. Methods: A total of 642 patients with 717 SSNs who underwent long-term follow-up were retrospectively collected from three medical centers. Patients were categorized into growth and non-growth groups based on the growth status of presented SSNs within 2 or 5 years, and they were randomly divided into training and internal testing sets at an 8:2 ratio. Predictive models were developed using the optimal ML algorithms for clinical, radiomics, and clinical-radiomics fusion models to assess the risk of SSN growth over different timeframes. An independent external test set was established by including another 95 patients with 105 SSNs from a health examination center. Multiple assessment indices, including the area under the receiver-operating-characteristic curve (AUC), were utilized to assess and compare predictive performance. Furthermore, the SHapley Additive exPlanation (SHAP) method was employed to rank the importance of features and elucidate the rationale behind the final model. Results: The extreme gradient boosting (XGBoost) and light gradient boosting machine (Light GBM) model performed best in discriminative ability among the 8 ML models. For the prediction of within-2-year growth, the clinical, radiomics, and clinical-radiomics fusion models developed using the optimal ML algorithms achieved the AUC of 0.823 (95% CI: 0.745-0.906), 0.889 (95% CI: 0.823-0.943), and 0.911 (95% CI: 0.858-0.955) on the internal testing set, and the AUC of 0.712 (95% CI: 0.610-0.815), 0.734 (95% CI: 0.616-0.830), and 0.734 (95% CI: 0.623-0.835) on the external testing set. In 5-year growth prediction task, the three models achieved AUCs of 0.796 (95% CI: 0.708-0.884), 0.838 (95% CI: 0.759-0.905), and 0.849 (95% CI: 0.772-0.913) on the internal testing set and AUCs of 0.672 (95% CI: 0.550-0.795), 0.773 (95% CI: 0.657-0.880), and 0.776 (95% CI: 0.652-0.882) on the external testing set. Furthermore, these insights have been translated into a streamlined clinical management framework, enhancing its utility within clinical settings. Conclusions: The interpretable machine learning model we developed based on multicenter longitudinal follow-up data for SSN has been successfully developed to accurately predict changes in SSN over 2 years and used for the first time to guide 5-year long-term follow-up. Keywords: Subsolid nodules, Natural course, Lung adenocarcinoma, Radiomics, Machine learning.
Submission Number: 2
Loading