The Energy to Say No: Pre-Generation Abstention for Safety-Critical Medical RAG

Published: 12 Oct 2025, Last Modified: 13 Oct 2025GenAI4Health 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Retrieval-Augmented Generation (RAG), Abstention, Out-of-Distribution Detection, Energy-Based Models, Women’s Health, GenAI4Health, AI for Healthcare, Trustworthy AI
TL;DR: We introduce MS-EBM, a pre-generation energy-based abstention layer that reliably rejects unsafe queries in medical RAG, reducing error rates on hard out-of-distribution cases while remaining fast, model-agnostic, and scalable.
Abstract: Retrieval-augmented generation (RAG) systems require reliable abstention mechanisms to avoid generating harmful responses, particularly in safety-critical domains such as women's health where incorrect answers can lead to serious consequences. However, existing confidence estimation approaches often fail to provide adequate safety guarantees for pre-generation decision making. We introduce the Margin-Structured Energy-Based Model (MS-EBM), a framework that learns smooth energy landscapes over dense semantic representations of guideline-derived questions, enabling systems to make principled abstention decisions before generation occurs. Using identical in-batch negatives for training and validation, we evaluate MS-EBM against softmax-based confidence estimation and non-parametric baselines including k-NN, ODIN, and Mahalanobis distance across three out-of-distribution scenarios: Hard, Easy, and Mixed splits. Results demonstrate substantial improvements in abstention quality, with MS-EBM achieving AUROC scores of 0.946, 0.977, and 0.961 on Hard, Easy, and Mixed splits respectively, compared to 0.895, 0.937, and 0.916 for softmax baselines. The model also significantly reduces false positive rates, achieving FPR@95TPR of 41.3% versus 69.4% on Hard splits. Comprehensive ablation studies reveal that heterogeneous negative sampling, combining both hard and easy negatives, proves essential for robust out-of-distribution generalisation, while curriculum design shows minimal impact once diverse negatives are included. Analysis through risk-coverage curves and energy-gap distributions confirms that MS-EBM's scoring provides more reliable confidence signals than probability-based approaches, offering a scalable and interpretable foundation for building safer RAG systems.
Submission Number: 143
Loading