2020 (modified: 25 Feb 2022)AISTATS 2020Readers: Everyone
Abstract:We propose RandUCB, a bandit strategy that uses theoretically derived confidence intervals similar to upper confidence bound (UCB) algorithms, but akin to Thompson sampling (TS), uses randomization...