Multi-Agent Lipschitz Bandits
Abstract: We study the decentralized multi-player stochastic bandit problem over a continuous, Lipschitz-structured action space where hard collisions yield zero reward. Our objective is to design a communication-free policy that maximizes collective reward, with coordination costs that are independent of the time horizon $T$. We propose a modular protocol that first solves the multi-agent coordination problem—identifying and seating players on distinct, high-value regions via a novel maxima-directed search—and then decouples the problem into $N$ independent single-player Lipschitz bandits. We establish a near-optimal regret bound of $\tilde{O}(T^{(d+1)/(d+2)})$ plus a $T$-independent coordination cost, matching the single-player rate. Our framework is, to our knowledge, the first to provide such guarantees and extends to general distance-threshold collision models.
Submission Number: 1933
Loading