TL;DR: This paper addresses the agent collaboration in linear bandit problems via sample sharing, deriving an effective bias-variance trade-off for regret minimization.
Abstract: The multi-agent linear bandit setting is a well-known setting for which designing efficient collaboration between agents remains challenging. This paper studies the impact of data sharing among agents on regret minimization. Unlike most existing approaches, our contribution does not rely on any assumptions on the bandit parameters structure. Our main result formalizes the trade-off between the bias and uncertainty of the bandit parameter estimation for efficient collaboration. This result is the cornerstone of the Bandit Adaptive Sample Sharing (BASS) algorithm, whose efficiency over the current state-of-the-art is validated through both theoretical analysis and empirical evaluations on both synthetic and real-world datasets. Furthermore, we demonstrate that, when agents' parameters display a cluster structure, our algorithm accurately recovers them.
Lay Summary: Recommendation algorithms—like those used on video platforms—often serve users with similar tastes. While a recommendation system could benefit from sharing what it has learned about one user, doing so effectively requires identifying when user preferences overlap. This motivated us to explore how such systems can collaborate to accelerate learning.
We developed BASS, a method that enables algorithms to decide when and with whom to share information. It uses observed behavior to detect when recommendation systems are learning from similar user groups and shares information only when it improves performance. Notably, BASS requires no prior knowledge about which systems are related.
This approach makes collaboration between learning systems more efficient and impactful. Whether applied to apps, devices, or content platforms, BASS helps them learn faster by leveraging shared patterns across users. Experiments on both synthetic and real-world data show that BASS consistently outperforms existing methods.
Link To Code: https://github.com/hcherkaoui/collaborative_bandits
Primary Area: Theory->Online Learning and Bandits
Keywords: linear bandit, collaboration, sample sharing
Submission Number: 12916
Loading