Published: 2021, Last Modified: 08 May 2023ICML 2021Readers: Everyone
Abstract:We consider the adversarial multi-armed bandit problem under delayed feedback. We analyze variants of the Exp3 algorithm that tune their step size using only information (about the losses and delay...