Conformal Reliability: A New Evaluation Metric for Conditional Generation

ICLR 2026 Conference Submission17600 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: uncertainty evaluation; conformal prediction; reliability; conditional generation model
TL;DR: We propose a new reliability metric to evaluate generative models, alongwith a computational framework based on conformal prediction to compute this score.
Abstract: Conditional generative models have recently achieved remarkable success in various applications. However, a suitable metric for evaluating the reliability of these models, which takes into account their inherent uncertainty, is still lacking. Existing metrics, which typically assess a single output, may fail to capture the variability or potential risks in generation. In this paper, we propose a novel evaluation metric called \emph{reliability score} based on conformal prediction, which measures the worst-case performance within the prediction set at a pre-specified confidence level. However, computing this score is challenging due to the high-dimensional nature of the output space and the nonconvexity of both the metric function and the prediction set. To efficiently compute this score, we introduce Conformal ReLiability (CReL), a framework that can \textbf{(i)} construct the prediction set with desired coverage; and \textbf{(ii)} accurately optimize the reliability score. We provide theoretical results on coverage and demonstrate empirically that our method produces more informative prediction sets than existing approaches. Experiments on synthetic data and an image-to-text task further demonstrate the interpretability of our new metric, and the validity and effectiveness of our computational framework.
Primary Area: probabilistic methods (Bayesian methods, variational inference, sampling, UQ, etc.)
Submission Number: 17600
Loading