Optimal and Practical Batched Linear Bandit Algorithm

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY-NC-ND 4.0
Abstract: We study the linear bandit problem under limited adaptivity, known as the batched linear bandit. While existing approaches can achieve near-optimal regret in theory, they are often computationally prohibitive or underperform in practice. We propose BLAE, a novel batched algorithm that integrates arm elimination with regularized G-optimal design, achieving the minimax optimal regret (up to logarithmic factors in $T$) in both large-$K$ and small-$K$ regimes for the first time, while using only $O(\log\log T)$ batches. Our analysis introduces new techniques for batch-wise optimal design and refined concentration bounds. Crucially, BLAE demonstrates low computational overhead and strong empirical performance, outperforming state-of-the-art methods in extensive numerical evaluations. Thus, BLAE is the first algorithm to combine provable minimax-optimality in all regimes and practical superiority in batched linear bandits.
Lay Summary: Many real-world tasks involve repeatedly making decisions to achieve the best possible outcomes, such as recommending products online, selecting medical treatments in clinical trials, or dynamically setting prices for online shoppers. These problems often involve balancing the exploration of new options with the exploitation of already-known successful strategies. However, updating these decision-making systems too frequently can be impractical due to costs, ethical issues, or computational limitations. In this research, we tackle the challenge of making good decisions when updates can only happen occasionally—in batches rather than continuously. Existing methods that work well theoretically either fail in practice or require too much computation. We propose a new method, called BLAE, that carefully eliminates poor-performing options while selecting promising ones efficiently. Our method achieves optimal performance guarantees, meaning it performs nearly as well as theoretically possible under any conditions. Crucially, BLAE also performs strongly in real-world applications, requiring fewer updates and less computation. In short, our approach bridges the gap between theory and practice, providing an effective and practical solution for decision-making problems where frequent updates are not feasible.
Primary Area: Theory->Online Learning and Bandits
Keywords: linear bandit, batched bandit, exploration-exploitation
Submission Number: 15612
Loading