Complete Blood Count and Monocyte Distribution Width–Based Machine Learning Algorithms for Sepsis Detection: Multicentric Development and External Validation Study (Preprint)

Andrea Campagner, Luisa Agnello, Anna Carobene, Andrea Padoan, Fabio Del Ben, Massimo Locatelli, Mario Plebani, Agostino Ognibene, Maria Lorubbio, Elena De Vecchi, Andrea Cortegiani, Elisa Piva, Donatella Poz, Francesco Curcio, Federico Cabitza, Marcello Ciaccio

Published: 15 Dec 2023, Last Modified: 13 Feb 2026CrossrefEveryoneRevisionsCC BY-SA 4.0

Abstract: Background: Sepsis is an organ dysfunction caused by a dysregulated host response to infection. Early detection is fundamental to improve the patient outcome. Laboratory Medicine can have a crucial role by providing biomarkers whose alteration could be detected before onset of clinical signs and symptoms. In particular, the relevance of Monocyte Distribution Width (MDW) as a sepsis biomarker has emerged in the previous decade. Despite encouraging results, however, MDW has poor sensitivity and positive predictive value when compared to other biomarkers. Objective: Machine Learning (ML) techniques offer the promise to overcome the above-mentioned limitations, by combining different parameters and therefore improving sepsis detection performance. Making ML models function in clinical practice, however, may be problematic, as their performance may suffer when deployed in contexts other than the research environment: in fact, even widely used commercially available models have been demonstrated to generalize poorly in out-of-distribution scenarios. The aim of this multi-centric study was to develop and externally validate ML models whose intended use is the early detection and screening of sepsis on the basis of MDW and other Complete Blood Count parameters. Methods: Five patient cohorts (encompassing 5344 patients) collected at five different Italian hospitals were used to train and externally validate six ML models. To improve generalizability and robustness to different types of data distribution shifts, the developed ML models combine traditional ML methodologies with advanced techniques inspired by controllable AI, namely: cautious classification, which gives the ML models the ability to abstain from making predictions; and explainable AI, which provides clinicians and health operators with useful information about the models' functioning. Results: The developed models achieved good diagnostic performance on the internal validation (AUC between 0.91 and 0.98) as well as consistent generalization performance across the external validation datasets (AUC between 0.75 and 0.95), outperforming baseline biomarkers and state-of-the-art ML models for sepsis detection. Controllable AI techniques were further able to improve performance, and were used to derive a simple, interpretable set of diagnostic rules. Conclusions: Our findings demonstrate how controllable AI approaches based on CBC and MDW may be used for the early detection of sepsis, while also demonstrating how the proposed methodology can be used to develop ML models that are more resistant to different types of data distribution shifts.

External IDs:doi:10.2196/preprints.55492