Keywords: Conformal Prediction, Uncertainty quantification, prediction sets, human-ai collaboration, decision making, large language models
Abstract: AI predictive systems are becoming integral to decision-making pipelines, shaping high-stakes choices once made solely by humans. Yet robust decisions under uncertainty still depend on capabilities that current AI lacks: domain knowledge not captured by data, long-horizon context, and the ability to reason and act in the physical world. This contrast has sparked growing efforts to design **collaborative** frameworks that combine the complementary strengths of both agents. This work advances this vision by identifying the fundamental principles of Human-AI collaboration in the context of uncertainty quantification---an essential component of any reliable decision-making pipeline. We introduce Human-AI Collaborative Uncertainty Quantification, a framework that formalizes how an AI model can refine a human expert’s proposed prediction set with two goals in mind: **avoiding counterfactual harm**, ensuring the AI does not degrade the human’s correct judgments, and **complementarity**, enabling the AI to recover correct outcomes the human missed. At the population level, we show that the optimal collaborative prediction set takes the form of an intuitive two-threshold structure over a single score function, extending a classical result in conformal prediction. Building on this insight, we develop practical offline and online calibration algorithms with provable **distribution free** finite-sample guarantees. The online algorithm adapts to **any** distribution shifts, including the interesting case of human behavior evolving through interaction with AI, a phenomenon we call “Human-to-AI Adaptation.” We validate the framework across three modalities---image classification, regression, and text-based medical decision-making---using models from convolutional networks to LLMs. Results show that collaborative prediction sets consistently outperform either agent alone, achieving higher coverage and smaller set sizes across various conditions, including shifts in human behavior.
Primary Area: probabilistic methods (Bayesian methods, variational inference, sampling, UQ, etc.)
Submission Number: 9945
Loading