Asymptotically and Minimax Optimal Regret Bounds for Multi-Armed Bandits with Abstention

TMLR Paper6960 Authors

10 Jan 2026 (modified: 15 Jan 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: We introduce a novel extension of the canonical multi-armed bandit problem that incorporates an additional strategic innovation: \emph{abstention}. In this enhanced framework, the agent is not only tasked with selecting an arm at each time step, but also has the option to {\em abstain} from accepting the stochastic instantaneous reward before observing it. When opting for abstention, the agent either suffers a fixed regret or gains a guaranteed reward. This added layer of complexity naturally prompts the key question: can we develop algorithms that are both computationally efficient and asymptotically and minimax optimal in this setting? We answer this question in the affirmative by designing and analyzing algorithms whose regrets meet their corresponding information-theoretic lower bounds. Our results offer valuable quantitative insights into the benefits of the abstention option, laying the groundwork for further exploration in other online decision-making problems with such an option. Extensive numerical experiments validate our theoretical results, demonstrating that our approach not only advances theory but also has the potential to deliver significant practical benefits.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Junpei_Komiyama1
Submission Number: 6960
Loading