lil' UCB : An Optimal Exploration Algorithm for Multi-Armed Bandits

Kevin G. Jamieson, Matthew Malloy, Robert D. Nowak, Sébastien Bubeck

2014 (modified: 08 Nov 2022)COLT 2014Readers: Everyone

Abstract: The paper proposes a novel upper confidence bound (UCB) procedure for identifying the arm with the largest mean in a multi-armed bandit game in the fixed confidence setting using a small number of ...

0 Replies