Protective predictors of cardiovascular disease: an explainable AI approach

Minh H. N. Le, Hien Quang Kha, Han H. Huynh, Phat K. Huynh, Phat Ky Nguyen, Dang Nguyen, Trang D. T. Le, Nghi V. Tran, Quoc Bui, Hoang Tran Pham, Hoai H. Le, Thomas Duong, Nhi H. H. Le, Loc Vu, Vien Truong, Thach Nguyen, Chi N. Duong, Nguyen Quoc Khanh Le

Published: 01 Jan 2026, Last Modified: 30 Nov 2025Public HealthEveryoneRevisionsCC BY-SA 4.0

Abstract: ObjectivesTo develop interpretable machine learning (ML) models using nationally representative survey data to identify protective factors against cardiovascular disease (CVD), addressing gaps in traditional clinical risk scores across diverse populations.Study designCross-sectional analysis of the 2021 BRFSS.MethodsWe analyzed 116,608 adult records after data cleaning. Three ML models (XGBoost, convolutional neural network, random forest) were trained on 11 demographic and behavioral features (age, sex, race/ethnicity, income, smoking, alcohol use, depression, diabetes, insurance status, and fruit and vegetable intake). Performance was assessed using precision, recall, F1-score, AUROC, and AUPRC. SHapley Additive exPlanations (SHAP) were used for interpretability.ResultsXGBoost outperformed other models, achieving precision 0.90, recall 0.82, F1-score 0.86, AUROC 0.76, and AUPRC 0.95. SHAP indicated younger age, higher income, insurance coverage, and absence of diabetes or depression as strong protective predictors.ConclusionsAn explainable XGBoost model predicts cardiovascular resilience by emphasizing absence of diabetes, mental health stability, socioeconomic advantage, and younger age, supporting proactive and equitable prevention and more efficient resource allocation in CVD care.

External IDs:doi:10.1016/j.puhe.2025.106050