Bayesian Decision Making Observing an Expert

Published: 29 Sept 2025, Last Modified: 12 Oct 2025NeurIPS 2025 - Reliable ML WorkshopEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Bayesian learning, multi-agent bandits, trustworthy learning
TL;DR: We study the problem of bayesian online decision making when agents can observe expert behaviour.
Abstract: Learning agents are increasingly deployed alongside existing experts, such as human operators or previously trained agents. While Bayesian methods like Thompson Sampling offer principled approaches to trade-offs between reward learning and information gain, it is unclear how a learner should optimally incorporate expert information, which differs in kind from its own action-outcome experiences. We study this problem of online Bayesian learning next to an expert in multi-armed bandits. We consider: (i) an offline setting, where the learner receives a dataset of outcomes from the expert's optimal arm before interaction, and (ii) a simultaneous setting, where the learner must choose at each step whether to update its beliefs using its own experience or the expert's concurrent outcome. We formalize how expert data influences the learner's posterior, and prove that pretraining on expert outcomes tightens information-theoretic regret bounds by the mutual information between the expert data and the optimal arm. For the simultaneous setting, we propose an information-directed rule where the learner processes the data source that maximizes the one-step information gain about the optimal arm. We empirically validate our findings, showing that the value of expert information is highest in asymmetric environments where it can significantly prune the parameter space, and we demonstrate that our information-directed agent successfully leverages this to accelerate learning.
Submission Number: 87
Loading