Predictive Modeling of Diabetes using EMR Data

Published: 01 Jan 2022, Last Modified: 11 Feb 2025HEALTHINF 2022EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: As the prevalence of diabetes continues to increase globally, an efficient diabetes prediction model based on Electronic Medical Records (EMR) is critical to ensure the well-being of the patients and reduce the burden on the healthcare system. Prediction of diabetes in patients at an early stage and analysis of the risk factors can enable diabetes primary and secondary prevention. The objective of this study is to explore various classification models for identifying diabetes using EMR data. We extracted patient information, disease, health conditions, billing, and medication from EMR data. Six machine learning algorithms including three ensemble and three non-ensemble classifiers were used namely XGBoost, Random Forest, AdaBoost, Logistic Regression, Naive Bayes, and K-Nearest Neighbor (KNN). We experimented with both imbalanced data with the original class distribution and artificially balanced data for training the models. Our results indicate that the Random Forest model overall
Loading