Keywords: Multi-armed bandits, best-arm identification, fixed budget, heterogeneous reward variances
TL;DR: We design and analyze best-arm identification algorithms for the fixed-budget setting with heterogeneous reward variances
Abstract: We study the problem of best-arm identification (BAI) in the fixed-budget setting with heterogeneous reward variances. We propose two variance-adaptive BAI algorithms for this setting: SHVar for known reward variances and SHAdaVar for unknown reward variances. The key idea in our algorithms is to adaptively allocate more budget to arms with higher reward variances. The main algorithmic novelty is in the design of SHAdaVar, which allocates budget greedily based on overestimating unknown reward variances. We bound the probabilities of misidentifying best arms in both SHVar and SHAdaVar. Our analyses rely on novel lower bounds on the number of arm pulls in BAI that do not require closed-form solutions to the budget allocation problem. One of our budget allocation problems is equivalent to the optimal experiment design with unknown variances and thus of a broad interest. We also evaluate our algorithms on synthetic and real-world problems. In most settings, SHVar and SHAdaVar outperform all prior algorithms.
Supplementary Material: pdf
Other Supplementary Material: zip