Thompson Sampling-like Algorithms for Stochastic Rising Rested Bandits

Published: 01 Aug 2024, Last Modified: 09 Oct 2024EWRL17EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Rising bandits, Regret minimization, Thompson sampling
Abstract: Stochastic rising rested bandit (SRRB) is a specific bandit setting where the arms' expected rewards increase as they are pulled. They model scenarios in which the performances of the different options grow as an effect of an underlying learning process (e.g., online model selection). Even if the bandit literature provides specifically crafted algorithms based on upper-confidence bound approaches for such a setting, no study about Thompson sampling-like algorithms has been performed. Indeed, the specific trend and the strong regularity of the expected rewards given by the SRRB setting suggest that specific instances may be tackled effectively using classical Thompson sampling or some adapted versions. This work provides a novel theoretical analysis of the regret that such algorithms suffer in SRRB. Our results show that, under specific assumptions on the reward functions, even the Thompson sampling-like algorithms achieve the no-regret property.
Submission Number: 52
Loading