Corruption-Robust Variance-aware Algorithms for Generalized Linear Bandits under Heavy-tailed Rewards

Qingyuan Yu; Euijin Baek; Xiang Li; Qiang Sun

Corruption-Robust Variance-aware Algorithms for Generalized Linear Bandits under Heavy-tailed Rewards

Qingyuan Yu, Euijin Baek, Xiang Li, Qiang Sun

Published: 07 May 2025, Last Modified: 28 Jul 2025UAI 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Reinforcement Learning theory, stochastic bandits, adaptive Huber regression

TL;DR: GAdaOFUL is a new algorithm using adaptive Huber regression to achieve variance-aware regret in stochastic bandits with generalized linear models, addressing heavy-tailed noise and reward corruption.

Abstract: Stochastic linear bandits have recently received significant attention in sequential decision-making. However, real-world challenges such as heavy-tailed noise, reward corruption, and nonlinear reward functions remain difficult to address. To tackle these difficulties, we propose GAdaOFUL, a novel algorithm that leverages adaptive Huber regression to achieve robustness in generalized linear models (GLMs), where rewards can be nonlinear functions of features. GAdaOFUL achieves a state-of-the-art variance-aware regret bound, scaling with the square root of the cumulative reward variance over time, plus an additional term proportional to the level of corruption. The algorithm adapts to problem complexity, yielding improved regret when the cumulative variance is small. Simulation results demonstrate the robustness and effectiveness of GAdaOFUL in practice. The code is available at \url{https://github.com/NeXAIS/GAdaOFUL}.

Latex Source Code: zip

Code Link: https://github.com/NeXAIS/GAdaOFUL

Readers: auai.org/UAI/2025/Conference, auai.org/UAI/2025/Conference/Area_Chairs, auai.org/UAI/2025/Conference/Reviewers, auai.org/UAI/2025/Conference/Submission259/Authors, auai.org/UAI/2025/Conference/Submission259/Reproducibility_Reviewers

Submission Number: 259

Loading