Nearly-tight Approximation Guarantees for the Improving Multi-Armed Bandits Problem
Abstract: We give nearly-tight upper and lower bounds for the {\em improving multi-armed bandits} problem. An instance of this problem has $k$ arms, each of whose reward function is a concave and increasing function of the {\em number of times that arm has been pulled so far}. We show that for any randomized online algorithm, there exists an instance on which it must suffer at least an $\Omega(\sqrt{k})$ approximation factor relative to the optimal reward. We then provide a randomized online algorithm that guarantees an $O(\sqrt{k})$ approximation factor, if it is told the maximum reward achievable by the optimal arm in advance. We then show how to remove this assumption at the cost of an extra $O(\log k)$ approximation factor, achieving an overall $O(\sqrt{k} \log k)$ approximation relative to optimal.
PDF: pdf
Submission Number: 82
Loading