Abstract: Following the work of \cite{chang2022online}, we consider the information asymmetric multiplayer adversarial bandits where at each round, the players each have their own set of actions to select from, contributing to a joint action. The players are not allowed to communicate during learning; however, they are allowed to agree on a strategy beforehand. We show that when the players pull simultaneously, there always exists an adaptive adversary that can incur linear regret. We modify the setting so that the pulls are successive instead, and show near optimal regret bounds in the case of successive pulls, both with and without reward asymmetry.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Matteo_Papini1
Submission Number: 7502
Loading