Exploration via Feature Perturbation in Contextual Bandits

Seouh-won Yi; Min-hwan Oh

Exploration via Feature Perturbation in Contextual Bandits

Seouh-won Yi, Min-hwan Oh

Published: 18 Sept 2025, Last Modified: 18 Dec 2025NeurIPS 2025 spotlightEveryoneRevisionsBibTeXCC BY-NC-ND 4.0

Keywords: Generalized Linear Bandits, Contextual Bandits, Thompson Sampling, Feature Perturbation

TL;DR: We study the feature-perturbing exploration method applicable to various bandit settings, and prove that our randomized method achieves optimal regret guarantee.

Abstract: We propose *feature perturbation*, a simple yet effective exploration strategy for contextual bandits that injects randomness directly into feature inputs, instead of randomizing unknown parameters or adding noise to rewards. Remarkably, this algorithm achieves $\widetilde{\mathcal{O}}(d\sqrt{T})$ worst-case regret bound for generalized linear contextual bandits, while avoiding the $\widetilde{\mathcal{O}}(d^{3/2}\sqrt{T})$ regret typical of existing randomized bandit algorithms. Because our algorithm eschews parameter sampling, it is both computationally efficient and naturally extends to non-parametric or neural network models. We verify these advantages through empirical evaluations, demonstrating that feature perturbation not only surpasses existing methods but also unifies strong practical performance with the near-optimal regret guarantees.

Supplementary Material: zip

Primary Area: Theory (e.g., control theory, learning theory, algorithmic game theory)

Submission Number: 14751

Loading