Generic Asymptotically Optimal Algorithms for Multi-Armed Bandits

Richard Combes, Stefan Magureanu, Alexandre Proutière

2018 (modified: 04 Nov 2022)Allerton 2018Readers: Everyone

Abstract: In this presentation, we address generic multi-armed bandit problems with stochastic rewards and known structure. Our notion of structure is generic and includes well-studied bandit structures such as linear, combinatorial, unimodal, Lipschitz, dueling etc. We propose a generic algorithm and prove its asymptotic optimality when the time horizon goes to infinity. We further propose a finite time regret analysis of our algorithm. As a byproduct of our analysis we develop several novel technical results which are useful to analyze generic bandit problems. More details can be found in the technical report https://arxiv.org/abs/1711.00400.

0 Replies