Keywords: Health AI, Deep Learning Clinical Prediction Models, Prediction Stability
Abstract: In clinical settings, prediction models are used daily to support decisions about patient outcomes. Recent advancements in machine learning and, in particular, deep learning have significantly expanded the number of available models, yet many may lack the individual prediction stability required for clinical use. Their outputs can vary substantially when trained on different samples of the same population, undermining trust and reproducibility. Ensemble approaches such as bagging can mitigate this instability, however, at the cost of increased computation and reduced interpretability.
We propose a bootstrapping-based regularisation that embeds stability directly into the training of deep neural networks. By penalising divergence between predictions from the original training data and those from bootstrapped datasets, our method achieves the stability benefits of ensembles within a single model.
We evaluated our stable model against a standard model using three clinical datasets: GUSTO-I, Framingham, and SUPPORT. The stable model achieved markedly lower prediction instability (MAD 0.019 vs. 0.059 in GUSTO-I; 0.057 vs. 0.088 in Framingham; 0.071 vs. 0.092 in SUPPORT), with far fewer significantly deviating predictions (13.9% vs. 87.1%, 21.4% vs. 55.0%, and 40.2% vs. 57.7%, respectively). SHAP analyses showed that stability improvements did not compromise feature attribution, with strong per-participant correlations (0.894 in GUSTO-I, 0.965 in Framingham, 0.529 in SUPPORT).
In conclusion, by regularising predictions to align with bootstrapped distributions, the stable model achieved greater robustness and prediction stability while preserving discrimination and feature attribution. By varying the regularisation strength, our approach spans a continuum from a standard model to a bagging-like model, allowing users to balance performance and stability within a single interpretable model. This work directly addresses a key barrier to deploying deep learning in healthcare by producing risk estimates that are accurate, reproducible, and trustworthy.
Submission Number: 424
Loading