A decision support approach to infectious disease classification using tweets and routine blood tests
Abstract: Recent advancements in artificial intelligence have facilitated the development of diverse methods for detecting infectious diseases. Machine learning models, in particular, have demonstrated strong capabilities in predicting and classifying communicable diseases. However, once trained, these models typically rely solely on input features without incorporating external contextual information that could enhance their predictive reliability, especially in the dynamic context of disease transmission. This study proposes a novel classification framework that integrates heterogeneous data sources, specifically routine blood test results and social media-derived sentiment data, to improve patient diagnosis. Sentiment information was extracted from Twitter using both a K-Means clustering approach enhanced with AutoEncoder features, and a pre-trained DistilBERT-based sentiment analysis model. Standard classifiers, including Logistic Regression, Random Forest, and Gradient Boosting, were employed for the clinical classification task. The final prediction process involved fusing the model generated patient classification probabilities with sentiment-derived outputs using a weighted combination method, producing a new probability distribution for patient reclassification. Experimental results demonstrate that integrating sentiment analysis significantly improves model performance. Logistic Regression and Random Forest demonstrated performance gains of over 6% and 9%, respectively, across multiple metrics, accuracy, precision, recall, and F1-score, following the integration of K-means and DistilBERT sentiment analysis. Notably, XGBoost achieved the best ROC_AUC value of 95.10% when using sentiment probabilities derived from K-Means clustering. These findings underscore the value of incorporating social media sentiment as a contextual feature alongside clinical biomarkers to enhance infectious disease classification systems.
External IDs:dblp:journals/netmahib/MelchaneEKB25
Loading