Old Dog Learns New Tricks: Randomized UCB for Bandit ProblemsDownload PDFOpen Website

2020 (modified: 25 Feb 2022)AISTATS 2020Readers: Everyone
Abstract: We propose RandUCB, a bandit strategy that uses theoretically derived confidence intervals similar to upper confidence bound (UCB) algorithms, but akin to Thompson sampling (TS), uses randomization...
0 Replies

Loading