Old Dog Learns New Tricks: Randomized UCB for Bandit Problems

Sharan Vaswani, Abbas Mehrabian, Audrey Durand, Branislav Kveton

2020 (modified: 25 Feb 2022)AISTATS 2020Readers: Everyone

Abstract: We propose RandUCB, a bandit strategy that uses theoretically derived confidence intervals similar to upper confidence bound (UCB) algorithms, but akin to Thompson sampling (TS), uses randomization...

0 Replies