Stochastic Online Conformal Prediction with Semi-Bandit Feedback

Haosen Ge; Hamsa Bastani; Osbert Bastani

Stochastic Online Conformal Prediction with Semi-Bandit Feedback

Haosen Ge, Hamsa Bastani, Osbert Bastani

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Conformal prediction has emerged as an effective strategy for uncertainty quantification by modifying a model to output sets of labels instead of a single label. These prediction sets come with the guarantee that they contain the true label with high probability. However, conformal prediction typically requires a large calibration dataset of i.i.d. examples. We consider the online learning setting, where examples arrive over time, and the goal is to construct prediction sets dynamically. Departing from existing work, we assume semi-bandit feedback, where we *only observe the true label if it is contained in the prediction set*. For instance, consider calibrating a document retrieval model to a new domain; in this setting, a user would only be able to provide the true label if the target document is in the prediction set of retrieved documents. We propose a novel conformal prediction algorithm targeted at this setting, and prove that it obtains sublinear regret compared to the optimal conformal predictor. We evaluate our algorithm on a retrieval task, an image classification task, and an auction price-setting task, and demonstrate that it empirically achieves good performance compared to several baselines.

Lay Summary: Imagine a search engine that shows you a handful of results. If the one you're looking for is in that list, you can point it out. But if it's missing entirely, you can't really help—you just know it's not there. The system only gets feedback if it includes the right answer somewhere in the list. We look at how to train AI systems in exactly that kind of situation. The key idea comes from a technique called conformal prediction, which doesn’t pick a single answer—it outputs a set of possible answers, and promises that the correct one will be in that set most of the time. Normally, conformal prediction relies on having a lot of well-labeled data, but the real world doesn’t always work that way. Sometimes data comes in one example at a time, and you only find out the right answer if it was in your prediction set—like trying to learn from a quiz where the teacher only marks your answer if it's on a shortlist you came up with. To tackle this challenge, we propose a new algorithm that updates its predictions as data arrives, even when feedback is limited this way. We show that our method gradually improves and comes close to the best possible performance. This method could enable AI systems to learn more effectively in settings where full feedback is impractical—such as when there are too many candidate options.

Primary Area: General Machine Learning->Online Learning, Active Learning and Bandits

Keywords: Online Conformal Prediction, Semi-bandit Feedback

Submission Number: 7587

Loading