On the optimal regret of collaborative personalized linear bandits
TL;DR: We provide a complete characterization of the benefit of collaboration for heterogeneous linear bandits.
Abstract: Stochastic linear bandits are a fundamental model for sequential decision making. Although well studied in the single-agent setting, many real-world scenarios involve multiple agents solving heterogeneous bandit problems, each with a different unknown parameter. This paper investigates the optimal regret achievable in collaborative personalized linear bandits.
We derive an information-theoretic lower bound showing how the number of agents, the number of rounds, and the degree of heterogeneity jointly affect regret. We propose a two-stage collaborative algorithm that achieves the optimal regret. We model heterogeneity via a hierarchical Bayesian framework and introduces a novel information-theoretic technique for bounding regret. Our results offer a complete characterization of when and how collaboration helps with a optimal regret bound $\tilde{O}(d\sqrt{mn})$, $\tilde{O}(dm^{1-\gamma}\sqrt{n})$, $\tilde{O}(dm\sqrt{n})$ for the number of rounds $n$ in the range of $(0, \frac{d}{m \sigma^2})$, $[\frac{d}{m^{2\gamma} \sigma^2}, \frac{d}{\sigma^2}]$ and $(\frac{d}{\sigma^2}, \infty)$ respectively, where $\sigma$ measures the level of heterogeneity, $m$ is the number of agents, and $\gamma\in[0, 1/2]$ is an absolute constant. In contrast, without collaboration achieves a regret bound $O(dm\sqrt{n})$ at best.
Submission Number: 1071
Loading