Efficient Simple Regret Algorithms for Stochastic Contextual Bandits

Efficient Simple Regret Algorithms for Stochastic Contextual Bandits

ICLR 2026 Conference Submission13299 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: contextual bandits, logistic bandits, simple regret, Thompson sampling

Abstract: We study stochastic contextual logistic bandits under the simple regret objective. While simple regret guarantees are known for the linear case, no such results existed for the logistic setting. Building on ideas from contextual linear bandits and self-concordant analysis, we propose the first algorithm that achieves simple regret $\tilde{\mathcal{O}}(d/\sqrt{T})$. Notably, the leading term of our regret bound is free of $\kappa=\mathcal O(\exp(S))$, where $S$ is a bound on the magnitude of the unknown parameter vector, while the algorithm remains computationally tractable for finite action sets. We also introduce a new variant of Thompson Sampling adapted to the simple-regret setting, which yields the first simple regret guarantee for randomized algorithms in stochastic contextual linear bandits. Extending these tools to the logistic case, we obtain a Thompson Sampling variant with regret $\tilde{\mathcal O}(d^{3/2}/\sqrt{T})$, again free of $\kappa$ in the leading term. The randomized algorithms, as expected, are cheaper to run than their deterministic counterparts. Finally, we conducted a series of experiments to empirically validate these theoretical guarantees.

Supplementary Material: zip

Primary Area: learning theory

Submission Number: 13299

Loading