Online Conformal Prediction with Adversarial Semi-bandit Feedback via Regret Minimization

Published: 02 Mar 2026, Last Modified: 24 Mar 2026Agentic AI in the Wild: From Hallucinations to Reliable Autonomy PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Online Conformal Prediction, Multi-armed Adversarial Bandit, Semi-bandit, Uncertainty Quantification, Trustworthy Machine Learning
TL;DR: We provide a novel algorithm for online conformal prediction with adversarial semi-bandit feedback that both enjoys theoretical guarantee and decent empirical performance in terms of the long-run coverage and set-size efficiency.
Abstract: Uncertainty quantification is crucial in safety-critical systems, where decisions must be made under uncertainty. Online conformal prediction is a principled online uncertainty quantification method that dynamically constructs a prediction set at each time step. While existing methods for online conformal prediction provide long-run coverage guarantees without any distributional assumptions, they typically assume a *full feedback* setting in which the true label is always observed. In this paper, we propose a novel learning method for online conformal prediction with *partial feedback* from an adaptive adversary—a more challenging setup where the true label is revealed only when it lies inside the constructed prediction set. Specifically, we formulate online conformal prediction as an adversarial bandit problem by treating each candidate prediction set as an arm. Building on an existing algorithm for adversarial bandits, our method achieves a long-run coverage guarantee by explicitly establishing its connection to the regret of the learner. Finally, we empirically demonstrate the effectiveness of our method in both independent and identically distributed (i.i.d.) and non-i.i.d. settings.
Submission Number: 34
Loading