Combinatorial Pure Exploration of Multi-Armed Bandit with a Real Number Action Class

Shintaro Nakamura; Masashi Sugiyama

Combinatorial Pure Exploration of Multi-Armed Bandit with a Real Number Action Class

Shintaro Nakamura, Masashi Sugiyama

Published: 01 Jan 2023, Last Modified: 27 Sept 2024CoRR 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: We study the real-valued combinatorial pure exploration problem in the stochastic multi-armed bandit (R-CPE-MAB). We study the case where the size of the action set is polynomial with respect to the number of arms. In such a case, the R-CPE-MAB can be seen as a special case of the so-called transductive linear bandits. Existing methods in the R-CPE-MAB and transductive linear bandits have a gap of problem-dependent constant terms and logarithmic terms between the upper and lower bounds of the sample complexity, respectively. We close these gaps by proposing an algorithm named the combinatorial gap-based exploration (CombGapE) algorithm, whose sample complexity upper bound matches the lower bound. Finally, we numerically show that the CombGapE algorithm outperforms existing methods significantly.

Loading