Optimal Batched (Generalized) Linear Contextual Bandit Algorithm

Optimal Batched (Generalized) Linear Contextual Bandit Algorithm

ICLR 2026 Conference Submission22355 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: linear contextual bandit, generalized linear contextual bandit, batched bandit

Abstract: We study batched linear and generalized linear contextual bandits and introduce practical batched algorithms, aiming for methods that are both practical and provably optimal under limited adaptivity. For linear contextual bandits, we propose the first algorithm that attains minimax-optimal regret (up to polylogarithmic factors in $T$) in both small-$K$ and large-$K$ regimes using only $O(\log\log T)$ batches, while our second algorithm removes the G-optimal design step—the dominant computational bottleneck—yet preserves the same order of statistical guarantees and achieves the lowest known runtime complexity. We then adapt to the generalized linear contextual bandits and design an algorithm that is fully free of curvature parameter $\kappa$: neither the algorithm requires knowledge of nor its regret bound depends on $\kappa$, and it retains $O(\log\log T)$ batch complexity with near-optimal regret. Collectively, these results deliver the first batched linear contextual methods that are simultaneously minimax-optimal across all regimes and computationally efficient, and the first generalized linear method that is both statistically and computationally efficient while remaining fully $\kappa$-independent.

Supplementary Material: zip

Primary Area: learning theory

Submission Number: 22355

Loading