OpenReview
.net
OpenReview
.net
Login
OpenReview
.net
Login
Go to
ALT 2020
homepage
Bandit Algorithms Based on Thompson Sampling for Bounded Reward Distributions
Charles Riou
,
Junya Honda
2020 (modified: 24 Apr 2023)
ALT 2020
Readers:
Everyone
Abstract:
We focus on a classic reinforcement learning problem, called a multi-armed bandit, and more specifically in the stochastic setting with reward distributions bounded in $[0,1]$. For this model, an o...
0 Replies
Loading