Indexed Minimum Empirical Divergence-Based Algorithms for Linear Bandits

Jie Bian; Vincent Y. F. Tan

Indexed Minimum Empirical Divergence-Based Algorithms for Linear Bandits

Jie Bian, Vincent Y. F. Tan

Published: 02 Apr 2024, Last Modified: 17 Sept 2024Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: The Indexed Minimum Empirical Divergence (IMED) algorithm is a highly effective approach that offers a stronger theoretical guarantee of the asymptotic optimality compared to the Kullback--Leibler Upper Confidence Bound (KL-UCB) algorithm for the multi-armed bandit problem. Additionally, it has been observed to empirically outperform UCB-based algorithms and Thompson Sampling. Despite its effectiveness, the generalization of this algorithm to contextual bandits with linear payoffs has remained elusive. In this paper, we present novel linear versions of the IMED algorithm, which we call the family of LinIMED algorithms. We demonstrate that LinIMED provides a $\widetilde{O}(d\sqrt{T})$ upper regret bound where $d$ is the dimension of the context and $T$ is the time horizon. Furthermore, extensive empirical studies reveal that LinIMED and its variants outperform widely-used linear bandit algorithms such as LinUCB and Linear Thompson Sampling in some regimes.

Submission Length: Regular submission (no more than 12 pages of main content)

Previous TMLR Submission Url: https://openreview.net/forum?id=3NXAHyuj5w

Changes Since Last Submission: Set fonts and/or margins to the template defaults.

Supplementary Material: zip

Assigned Action Editor: ~Lijun_Zhang1

Submission Number: 2028

Loading