Adversarially Robust Latent Bandits in Multiplayer Asymmetric Settings

Adversarially Robust Latent Bandits in Multiplayer Asymmetric Settings

TMLR Paper7042 Authors

16 Jan 2026 (modified: 11 Apr 2026)Rejected by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: We examine a novel multiplayer extension of the latent multi-armed bandit problem as formulated in \cite{maillard2014latent}, with broad applications such as recommendation systems and cognitive radio. Following \cite{chang2022online}, we examine three information asymmetric scenarios: Problem A, in which players receive identical rewards but cannot observe each other's actions; Problem B, players receive private i.i.d rewards but can observe others' actions; and Problem C, players receive private i.i.d rewards and cannot observe others' actions. For problems A and B, we provide nearly optimal gap-independent regret bounds. When reduced to the single agent setting, our results improve on \cite{maillard2014latent} by allowing for adversarial nature's actions. For Problem C, we use the knowledge of the reward means to improve on the results in \cite{chang2022online}.

Submission Type: Regular submission (no more than 12 pages of main content)

Assigned Action Editor: ~Shuai_Li3

Submission Number: 7042

Loading