Conformal Mirror Statistics for Model Alignment: Uncertainty Quantification with FDR Control

17 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Uncertainty Quantification; Model Alignment; Conformal Inference; False Discovery Rate Control
TL;DR: We propose Conformal Mirror Statistics, a model alignment method that provides FDR-controlled uncertainty quantification without constraints on labeled dataset size.
Abstract: Foundation models are increasingly adopted across diverse domains, but their safe deployment requires outputs that align with human interpretation, especially in high-stakes applications. This motivates the need for rigorous uncertainty quantification (UQ) methods to assess alignment reliability. Most existing methods rely on large labeled datasets, limiting their applicability in real-world settings where labeled data is scarce or expensive. In this paper, we introduce Conformal Mirror Statistics (CMS), a novel framework for UQ in model alignment, selecting aligned outputs for unlabeled data with the false discovery rate (FDR) under control. Unlike conventional conformal methods based on $p$-value calibration, CMS generalizes to broader settings without restrictive calibration size requirements. We further establish theoretical guarantees by proving FDR control under weaker data assumptions than existing methods. Empirical results on simulations and a large sepsis cohort from MIMIC-III demonstrate that CMS consistently outperforms conventional methods while reliably identifying aligned outputs.
Primary Area: probabilistic methods (Bayesian methods, variational inference, sampling, UQ, etc.)
Submission Number: 9517
Loading