Robust Best-of-Both-Worlds Gap Estimators Based on Importance-Weighted Sampling

Published: 17 Jun 2024, Last Modified: 17 Jun 2024FoRLaC PosterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: We present a novel strategy for robust estimation of the gaps in multiarmed bandits that is based on importance-weighted sampling. The strategy is applicable in best-of-both-worlds setting, namely, it can be used in both stochastic and adversarial regime with no need for prior knowledge of the regime. It is based on a pair of estimators, one based on standard importance weighted sampling to upper bound the losses, and another based on importance weighted sampling with implicit exploration to lower bound the losses. We combine the strategy with the EXP3++ algorithm to achieve best-of-both-worlds regret guarantees in the stochastic and adversarial regimes, and in the stochastically constrained adversarial regime. We conjecture that the strategy can be applied more broadly to robust gap estimation in reinforcement learning, which will be studied in future work.
Format: Short format (up to 4 pages + refs, appendix)
Publication Status: No
Submission Number: 47
Loading