Representative Action Selection for Large Action Space Meta-Bandits

16 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Multi-armed bandits, Expected supremum of Gaussian process, Epsilon net, High-dimensional probability
Abstract: We study the problem of selecting a subset from a large action space shared by a family of bandits, with the goal of achieving performance nearly matching that of using the full action space. We assume that similar actions tend to have related payoffs, modeled by a Gaussian process. To exploit this structure, we propose a simple epsilon-net algorithm to select a representative subset. We provide theoretical guarantees for its performance and compare it empirically to Thompson Sampling and Upper Confidence Bound.
Supplementary Material: zip
Primary Area: learning theory
Submission Number: 7235
Loading