Data-Driven Risk Stratification for Anastomotic Leak: A Proof-of-Concept Study Using Electronic Health Records
Abstract: Anastomotic leak (AL) is a serious and potentially fatal complication of gastrointestinal surgery, occurring in 2.8% to 8.4% of colorectal procedures. Early identification of high-risk patients could allow targeted perioperative interventions, yet no reliable, EHR-ready prediction tool exists in routine clinical practice. We developed and evaluated machine learning models for AL prediction using the MIMIC-IV database, drawing on 3,331 admissions from 2,847 patients, of whom 7.6% experienced an AL event. To situate this work, we also reviewed approximately 22 prior studies on ML-based AL prediction, which report AUC values ranging from 0.65 to 0.89, with few validated on external datasets. Our best model, a gradient boosting classifier built entirely with the Python standard library for reproducibility, achieved an AUC of 0.772. The strongest predictors were post-operative white blood cell count, ICU admission, post-operative lactate, surgery type, and pre-operative albumin level. These results show that structured EHR data alone can support meaningful AL risk stratification, without requiring proprietary software or specialist infrastructure. Key obstacles remain: no large external validation has been done, bias across demographic subgroups has not been tested, and the path to real clinical workflows is undefined. We frame each of these as concrete next steps for the field.
Loading