Abstract: In this paper, we extend linear combinatorial bandits \cite{gai2012combinatorial} to a multiplayer setting with information asymmetry \cite{chang2022online, chang2024optimal}, where each player controls an arm and independently decides whether to pull it, with coordination allowed only before rounds begin. We analyze three scenarios: action asymmetry (players can't observe others' actions but receive identical rewards per iteration), reward asymmetry (players observe actions but receive private i.i.d.\ rewards), and combined asymmetry. We derive near-optimal, gap-independent regret bounds for all scenarios: For action or reward asymmetry, we achieve $\tilde{\mathcal{O}}(\sqrt{T})$, which improves significantly from \cite{gai2010learning}; for both action and reward asymmetry, we achieve near-optimal bounds similar to that of \cite{chang2022online}. We finally generalize our results to settings where players decide either not to pull or to pull one out of multiple arms, and achieve similar bounds in similar settings as above.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Sivan_Sabato1
Submission Number: 7499
Loading