Abstract: We study the adversarial bandit problem against arbitrary strategies, where the difficulty is captured by an unknown parameter $S$, which is the number of switches in the best arm in hindsight. To handle this problem, we adopt the master-base framework using the online mirror descent method (OMD). We first provide a master-base algorithm with simple OMD, achieving $\tilde{O}(S^{1/2}K^{1/3}T^{2/3})$, in which $T^{2/3}$ comes from the variance of loss estimators. To mitigate the impact of the variance, we propose using adaptive learning rates for OMD and achieve $\tilde{O}(\min\{\sqrt{SKT\rho},S\sqrt{KT}\})$, where $\rho$ is a variance term for loss estimators.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Yaoliang_Yu1
Submission Number: 4793
Loading