Taming Variability: Randomized and Bootstrapped Conformal Risk Control for LLMs

Taming Variability: Randomized and Bootstrapped Conformal Risk Control for LLMs

ICLR 2026 Conference Submission21626 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Conformal Risk Control, Batched Bootstrap, Uncertainty Quantification, Calibration, LLM Hallucination Mitigation, LLM-as-Judge, Gram Matrix, Randomized Smoothing

TL;DR: A compute-aware, API-level actuator using Conformal Risk Control with batched bootstrap and randomized weighting that reliably reduces hallucinations and calibrates LLM-as-Judge, powered by a label-free Gram-geometry signal.

Abstract: We transform the randomness of LLMs into precise assurances using an actuator at the API interface that applies a user-defined risk constraint in finite samples via Conformal Risk Control (CRC). This label-free and model-agnostic actuator manages ship/abstain/regenerate/escalate actions based solely on a scalar score from opaque outputs. We enhance CRC's computational efficiency and robustness through Batched Bootstrap CRC (BB‑CRC) and Randomized Batched Weighted‑Average CRC (RBWA‑CRC), reducing calibration calls and stabilizing thresholds while maintaining statistical validity. Additionally, we present a semantic quantification method grounded in gram matrix geometry, resulting in interpretable signal and metric design. Together these pieces deliver principled randomness control for LLM hallucination mitigation and LLM-as-judge reliability. Our framework is assessed using four datasets, demonstrating its efficacy in enhancing factual accuracy and measuring LLM-as-judge performance, yielding a simplified and computationally efficient control layer that converts variability into statistical validity.

Primary Area: probabilistic methods (Bayesian methods, variational inference, sampling, UQ, etc.)

Submission Number: 21626

Loading