Abstract: In this presentation, we address generic multi-armed bandit problems with stochastic rewards and known structure. Our notion of structure is generic and includes well-studied bandit structures such as linear, combinatorial, unimodal, Lipschitz, dueling etc. We propose a generic algorithm and prove its asymptotic optimality when the time horizon goes to infinity. We further propose a finite time regret analysis of our algorithm. As a byproduct of our analysis we develop several novel technical results which are useful to analyze generic bandit problems. More details can be found in the technical report https://arxiv.org/abs/1711.00400.
0 Replies
Loading