Optimistic Thompson Sampling for No-Regret Learning in Unknown Games

Published: 29 Jun 2023, Last Modified: 04 Oct 2023MFPL PosterEveryoneRevisionsBibTeX
Keywords: Unknown game, Real-world applications
Abstract: We propose an Optimism-then-NoRegret learning framework for learning to play a repeated multiplayer game with an unknown reward function and bandit feedback. Our framework encompasses various game algorithms as special cases. It consists of an estimation step for constructing an imagined reward and a no-regret step for playing against an adversary. Thompson Sampling (TS) can be naturally included in the framework, but its effectiveness in this context remains unclear. We demonstrate that TS fails in a class of unknown games. To address this, we propose an optimistic variant of TS combined with suitable full-information adversarial bandit algorithms, achieving sublinear regret in the unknown game. We establish an information-theoretic regret bound for the proposed algorithms. Our analysis highlights that the optimistic variant encourages more exploration than classical TS in unknown games. We evaluate the algorithms on random matrix games and two real-world applications: radar anti-jamming and traffic routing problems. The proposed algorithms outperform baselines substantially.
Submission Number: 17
Loading