Abstract: We examine a novel multiplayer extension of the latent multi-armed bandit problem as formulated in \cite{maillard2014latent}, with broad applications such as recommendation systems and cognitive radio. Following \cite{chang2022online}, we examine three information asymmetric scenarios: Problem A, in which players receive identical rewards but cannot observe each other's actions; Problem B, players receive private i.i.d rewards but can observe others' actions; and Problem C, players receive private i.i.d rewards and cannot observe others' actions. For problems A and B, we provide nearly optimal gap-independent regret bounds. When reduced to the single agent setting, our results improve on \cite{maillard2014latent} by allowing for adversarial nature's actions. For Problem C, we use the knowledge of the reward means to improve on the results in \cite{chang2022online}.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Shuai_Li3
Submission Number: 7042
Loading