Bandit Algorithms Based on Thompson Sampling for Bounded Reward DistributionsDownload PDFOpen Website

2020 (modified: 24 Apr 2023)ALT 2020Readers: Everyone
Abstract: We focus on a classic reinforcement learning problem, called a multi-armed bandit, and more specifically in the stochastic setting with reward distributions bounded in $[0,1]$. For this model, an o...
0 Replies

Loading