Abstract: We study decentralized Gaussian process (GP) bandits under strict communication budgets and a shared reward function. We introduce X-KB-UCB, a gossip-based Upper-Confidence Bound (UCB) method in which agents periodically exchange only their most recent chosen arm and the observed reward. At gossip rounds, agents coordinate exploration through a cross-agent Kriging--Believer update, while between gossip rounds each agent follows the corresponding single-agent rule, GP-UCB for static rewards and TV-GP-UCB for time-varying rewards. We provide high-probability no-regret guarantees for augmented agents, using an agent-centric accounting that includes both locally collected and gossiped observations, in both the static setting and a time-varying setting modeled by a Markov-drift GP. The resulting bounds are expressed in terms of information gain and recover standard single-agent rates when gossip is absent. In the always-gossip regime, they match the centralized batch-selection rate of GP-BUCB, with an additional term reflecting drift. Experiments confirm that gossip yields consistent gains over independent agents and approaches a centralized baseline under the same evaluation budget.
Submission Type: Long submission (more than 12 pages of main content)
Assigned Action Editor: ~Mohammad_Hajiesmaili1
Submission Number: 8074
Loading