Thompson Sampling For Bandits With Cool-Down Periods

Jingxuan Zhu; Bin Liu

Thompson Sampling For Bandits With Cool-Down Periods

Jingxuan Zhu, Bin Liu

Published: 03 Nov 2025, Last Modified: 03 Nov 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: This paper investigates a variation of dynamic bandits, characterized by arms that follow a periodic availability pattern. Upon a "successful" selection, each arm transitions to an inactive state and requires a possibly unknown cool-down period before becoming active again. We devise Thompson Sampling algorithms specifically designed for this problem, guaranteeing logarithmic regrets. Notably, this work is the first to address scenarios in which the agent lacks knowledge of each arm's active state. Furthermore, the theoretical findings extend to the sleeping bandit framework, offering a notably superior regret bound compared to existing literature.

Submission Length: Regular submission (no more than 12 pages of main content)

Supplementary Material: pdf

Assigned Action Editor: ~Alberto_Maria_Metelli2

Submission Number: 4818

Loading