On the Interpretability of AI Cardiovascular Risk Models

Jean C. Polo, Marcela Iregui, Alexander Cerón, Wilson J. Sarmiento, Angel Cruz-Roa, Eduardo Romero, R. E. Gutiérrez-Carvajal

Published: 2024, Last Modified: 16 Apr 2025SIPAIM 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Cardiovascular Diseases (CVD) have remained the primary cause of death worldwide despite the multiple endeavors to design-based evidence models. Although these models are datadriven, they capture first-order relationships between complex and different co-variants. Lately, many Machine Learning (ML) models have been introduced to improve these linear approximations, but their limitations lie in their lack of interpretability. This study proposes a strategy to enhance the interpretability of ML models when predicting CVD risk. Using the longitudinal data from the Framingham Heart Study, ELI5, SHAP, and LIME techniques are separately implemented for men and women using models such as Logistic Regression (LR) and Extreme Gradient Boosting (XGBoost). This framework combines global and local interpretable methods to explain the importance of various risk factors, providing clinicians with a robust tool to make informed decisions and improve CVD risk management. Concluding findings highlight systolic blood pressure as the most important CVD risk factor while metabolic-related factors such as BMI show notable sex-specific differences. Interestingly, risk factors like smoking and diabetes show variable importance suggesting they affect individuals differently and may require segmented analysis. This research underscores the critical role of model explainability in clinical settings and offers a comprehensive approach to integrating ML models into healthcare practices.