Follow-the-Perturbed-Leader Achieves Best-of-Both-Worlds for Bandit ProblemsDownload PDFOpen Website

2023 (modified: 24 Apr 2023)ALT 2023Readers: Everyone
Abstract: This paper discusses the adversarial and stochastic $K$-armed bandit problems. In the adversarial setting, the best possible regret is known to be $O(\sqrt{KT})$ for time horizon $T$. This bound ca...
0 Replies

Loading