Variance-Aware Feel-Good Thompson Sampling for Contextual Bandits

Xuheng Li; Quanquan Gu

Variance-Aware Feel-Good Thompson Sampling for Contextual Bandits

Xuheng Li, Quanquan Gu

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Thompson sampling, variance aware, contextual bandits

Abstract: Variance-dependent regret bounds have received increasing attention in recent studies on contextual bandits. However, most of these studies are focused on upper confidence bound (UCB)-based bandit algorithms, while sampling based bandit algorithms such as Thompson sampling are still understudied. The only exception is the `LinVDTS` algorithm (Xu et al., 2023), which is limited to linear reward function and its regret bound is not optimal with respect to the model dimension. In this paper, we present `FGTSVA`, a variance-aware Thompson Sampling algorithm for contextual bandits with general reward function with optimal regret bound. At the core of our analysis is an extension of the decoupling coefficient, a technique commonly used in the analysis of Feel-good Thompson sampling (FGTS) that reflects the complexity of the model space. With the new decoupling coefficient denoted by $\mathrm{dc}$, `FGTS-VA` achieves the regret of $\tilde{\mathcal{O}}(\sqrt{\mathrm{dc}\cdot\log|\mathcal{F}|\sum_{t=1}^T\sigma_t^2}+\mathrm{dc})$, where $|\mathcal{F}|$ is the size of the model space, $T$ is the total number of rounds, and $\sigma_t^2$ is the subgaussian norm of the noise (e.g., variance when the noise is Gaussian) at round $t$. In the setting of contextual linear bandits, the regret bound of `FGTSVA` matches that of UCB-based algorithms using weighted linear regression (Zhou and Gu, 2022).

Primary Area: Theory (e.g., control theory, learning theory, algorithmic game theory)

Submission Number: 24346

Loading