Corruption-Robust Variance-aware Algorithms for Generalized Linear Bandits under Heavy-tailed Rewards

Published: 07 May 2025, Last Modified: 13 Jun 2025UAI 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Reinforcement Learning theory, stochastic bandits, adaptive Huber regression
TL;DR: GAdaOFUL is a new algorithm using adaptive Huber regression to achieve variance-aware regret in stochastic bandits with generalized linear models, addressing heavy-tailed noise and reward corruption.
Abstract: Stochastic linear bandits have recently received significant attention in sequential decision-making. However, real-world challenges such as heavy-tailed noise, reward corruption, and nonlinear reward functions remain difficult to address. To tackle these difficulties, we propose GAdaOFUL, a novel algorithm that leverages adaptive Huber regression to achieve robustness in generalized linear models (GLMs), where rewards can be nonlinear functions of features. GAdaOFUL achieves a state-of-the-art variance-aware regret bound, scaling with the square root of the cumulative reward variance over time, plus an additional term proportional to the level of corruption. The algorithm adapts to problem complexity, yielding improved regret when the cumulative variance is small. Simulation results demonstrate the robustness and effectiveness of GAdaOFUL in practice. The code is available at \url{https://github.com/NeXAIS/GAdaOFUL}.
Latex Source Code: zip
Code Link: https://github.com/NeXAIS/GAdaOFUL
Signed PMLR Licence Agreement: pdf
Readers: auai.org/UAI/2025/Conference, auai.org/UAI/2025/Conference/Area_Chairs, auai.org/UAI/2025/Conference/Reviewers, auai.org/UAI/2025/Conference/Submission259/Authors, auai.org/UAI/2025/Conference/Submission259/Reproducibility_Reviewers
Submission Number: 259
Loading