Abstract: We study stochastic multi-armed bandits with heterogeneous reward variances.
In the known-variance setting, we propose a variance-aware MOSS algorithm
that achieves minimax-optimal regret
matching an information-theoretic lower bound up to constants.
For the unknown-variance case, we construct high-probability variance
upper confidence bounds and show that the resulting algorithm attains
the same minimax rate up to a logarithmic factor.
Our analysis establishes sharp worst-case guarantees that explicitly
capture the variance structure of the problem.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Vincent_Tan1
Submission Number: 7501
Loading