2017 (modified: 09 Nov 2022)AISTATS2017Readers: Everyone
Abstract:We derive an alternative proof for the regret of Thompson sampling (TS) in the stochastic linear bandit setting. While we obtain a regret bound of order $O(d^3/2\sqrtT)$ as in previous results, the...