Keywords: coupled uncertainty, decision-making
Abstract: We initiate the study of decision-making under coupled uncertainties. In this problem, a learner has access to ground truth and coarse measurements of outcomes and would like to use them for decision-making. The learner has constrained access to ground truth measurements for only a given fraction of decision outcomes and would like to leverage the cheaper coarse measurements of decision outcomes. We introduce a model where the randomness of the ground and coarse measurements is coupled, and our approach learns their correlation to optimally combine coarse measurements with ground truth and achieve improved performance. This framework unifies several settings, like learning from multi-fidelity data sources and delegating decision-making to AI agents. We provide an upper confidence bounds based algorithm $\mathrm{CUUCB}$ for leveraging coupled uncertainties in a multi-armed bandit task, where the covariance structure between coarse measurements and ground truth is unknown. We show theoretically how $\mathrm{CUUCB}$ adapts to the underlying covariance structure by deriving instance-dependent and instance-independent regret bounds. We validate our algorithm in two experiments: a task with synthetically generated data, and an LLM benchmarking task. We compare our algorithm to existing $\mathrm{UCB}$ variants with access to only ground truth measurements on the constrained fraction of outcomes. In both cases, our algorithm is able to achieve lower regret.
Supplementary Material: zip
Primary Area: reinforcement learning
Submission Number: 21541
Loading