Exploring the Impacts of Features in Diabetes Prediction Models Using Machine Learning Algorithms Through Explainable Artificial Intelligence (XAI) Approach

IJCAI 2024 Workshop DocIU Submission1 Authors

26 Jun 2024 (modified: 10 Jul 2024)IJCAI 2024 Workshop DocIU Desk Rejected SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Diabetes, Impacts, predictive model, Explainable, Interpretable, XAI
Abstract: Background: Diabetes is a chronic condition that leads to a variety of consequences. It is a condition that is caused by several factors like age, lack of exercise, sedentary lifestyle, family history, high blood pressure, depression and stress, and dietary conditions. This study aimed to the exploration of feature impacts within diabetes prediction models using machine learning algorithms and an explainable artificial intelligence approach. Methodology: In this study, Data were extracted from the CDC. The collected data underwent preprocessing to prepare for predictive modeling. The class imbalance problem was addressed using the SMOTE + Tomek method. Additionally, various algorithms including Decision Trees (DT), Random Forests (RF), CatBoost, XGBoost, and LightGBM (LGBM) were employed. Ten experiments were conducted using a total of 227,804 datasets comprising 21 features. The data were split into training and testing sets in an 80:20 ratio using stratified shuffled methods. The impact of features was explored using removable-based and LOCO (Leave-One-Covariate-Out) methods. The predictive model was explained using SHAP and LIME XAI techniques to enhance trust in the results. The model's performance was evaluated using accuracy, precision, F1-score, and ROC curve metrics. Result: From all the developed predictive models, the LGBM classifier achieved the highest accuracy (83.33%) and precision (78.56%) among all models using the imbalanced dataset. Key contributing factors included BMI, Age, High blood pressure, cholesterol checkup, high cholesterol, Education, general health, Any Healthcare issues, heart disease attacks, and smoking. Relevant rules were generated to address diabetes using feature explanation techniques and the best-fitted model, enhancing trust in the predictive model's results. Conclusions: The LGBM algorithm is the optimal choice for diabetes prediction. Leveraging the LGBM model, we identified crucial factors and formulated pertinent rules. Feature impacts were scrutinized using LOCO and removable-based techniques. To facilitate user interaction, we designed a GUI using HTML for the front end and Flask for the back end, connecting to the LGBM model. Additionally, relevant rules generated by LGBM and feature relevance explanation techniques serve as valuable insights for policymakers.
Submission Number: 1
Loading