Tight Regret Bounds in Multi-Armed Bandits with Heterogeneous Variances

TMLR Paper7501 Authors

13 Feb 2026 (modified: 23 Feb 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: We study stochastic multi-armed bandits with heterogeneous reward variances. In the known-variance setting, we propose a variance-aware MOSS algorithm that achieves minimax-optimal regret matching an information-theoretic lower bound up to constants. For the unknown-variance case, we construct high-probability variance upper confidence bounds and show that the resulting algorithm attains the same minimax rate up to a logarithmic factor. Our analysis establishes sharp worst-case guarantees that explicitly capture the variance structure of the problem.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Vincent_Tan1
Submission Number: 7501
Loading