Sequential Halving Using Scores

Nicolas Fabiano, Tristan Cazenave

Published: 2021, Last Modified: 30 Sept 2024ACG 2021EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: We study the multi-armed bandit problem, where the aim is to minimize the simple regret with a fixed budget. The Sequential Halving algorithm is known to tackle it efficiently. We present a more elaborate version of this algorithm to integrate some exterior knowledge or “scores”, that can for instance be provided by a neural network or a heuristic such as all-moves-as-first (AMAF) in the context of a Monte-Carlo Tree Search. We provide both theoretical justifications and experiments.