Keywords: stochastic bandits, multimodal structure, instance-dependent optimality, Indexed Mimimum Empirical Divergence
TL;DR: We introduce IMED-MB, an algorithm that optimally exploits the multimodal structure, by adapting to this setting the popular IMED algorithm.
Abstract: We consider a multi-armed bandit problem specified by a set of one-dimensional exponential family distributions endowed with a multimodal structure. The multimodal structure naturally extends the unimodal structure and appears to be underlying in quite interesting ways popular structures such as linear or Lipschitz bandits. We introduce IMED-MB, an algorithm that optimally exploits the multimodal structure, by adapting to this setting the popular Indexed Minimum Empirical Divergence (IMED) algorithm. We provide instance-dependent regret analysis of this strategy. Numerical experiments show that \IMEDMB performs well in practice when assuming unimodal, polynomial or Lipschitz mean function.
Submission Number: 350
Loading