Bandit Algorithms Based on Thompson Sampling for Bounded Reward Distributions

Charles Riou, Junya Honda

2020 (modified: 24 Apr 2023)ALT 2020Readers: Everyone

Abstract: We focus on a classic reinforcement learning problem, called a multi-armed bandit, and more specifically in the stochastic setting with reward distributions bounded in $[0,1]$. For this model, an o...

0 Replies