Tight Regret Bounds in Multi-Armed Bandits with Heterogeneous Variances

Tight Regret Bounds in Multi-Armed Bandits with Heterogeneous Variances

13 Feb 2026 (modified: 20 Mar 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: We study stochastic multi-armed bandits with heterogeneous reward variances. In the known-variance setting, we propose a variance-aware MOSS algorithm that achieves minimax-optimal regret matching an information-theoretic lower bound up to constants. For the unknown-variance case, we construct high-probability variance upper confidence bounds and show that the resulting algorithm attains the same minimax rate up to a logarithmic factor. Our analysis establishes sharp worst-case guarantees that explicitly capture the variance structure of the problem.

Submission Type: Regular submission (no more than 12 pages of main content)

Assigned Action Editor: ~Vincent_Tan1

Submission Number: 7501

Loading