Can You Trust Your Model? Constructing Uncertainty Approximations  Guaranteeing Validity of Glioma Segmentation Explanations

Tianyi Ren; Daniel Low; Rachel Xiang; Pittra Jaengprajak; Juampablo E Heras Rivera; Riley Olson; Jacob Ruzevick; Mehmet Kurt

Can You Trust Your Model? Constructing Uncertainty Approximations Guaranteeing Validity of Glioma Segmentation Explanations

Tianyi Ren, Daniel Low, Rachel Xiang, Pittra Jaengprajak, Juampablo E Heras Rivera, Riley Olson, Jacob Ruzevick, Mehmet Kurt

Published: 14 Feb 2026, Last Modified: 14 Feb 2026MIDL 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Explainable AI, Uncertainty, Deep Learning, MRI

Abstract: Deep learning models have been successfully applied to glioma segmentation from multi-contrast MRI, yet model reasoning is difficult to validate clinically. Prior work used contrast-level Shapley values to explain how individual MRI sequences contribute to segmentation performance, and showed that alignment between these explanations and protocol-derived contrast rankings is associated with improved model performance. However, a single trained model may not reflect the optimal population-level model, and naive Monte Carlo uncertainty estimates provide no guarantees that the true optimal explanation lies within their intervals. In this work, we construct statistically valid uncertainty intervals for contrast-level Shapley values in glioma segmentation. Using a U-Net trained on the BraTS 2024 GoAT dataset, we compute Shapley values for each MRI contrast and tumor sub-region, form naive uncertainty estimations from cross-validation, and then apply a frequentist framework based on uniform convergence to define a confidence set of plausibly optimal models. By optimizing mixed objectives that trade off empirical loss and Shapley value, we approximate the Pareto frontier and obtain lower and upper bounds on the optimal explanation. We compare these intervals with clinically derived consensus and protocol rankings. Our results demonstrate that naive uncertainty estimations can lead to inconclusive or misleading conclusions about clinical alignment, whereas frequentist intervals provide principled guarantees on coverage of the optimal explanation and show moderate correlation with annotator consensus, enabling more reliable validation of model explanations against established clinical reasoning.

Primary Subject Area: Uncertainty Estimation

Secondary Subject Area: Interpretability and Explainable AI

Registration Requirement: Yes

Visa & Travel: Yes

Read CFP & Author Instructions: Yes

Originality Policy: Yes

Single-blind & Not Under Review Elsewhere: Yes

LLM Policy: Yes

Submission Number: 113

Loading