Data Source Adaptive Online Learning under Heteroscedastic Noise

Published: 22 Sept 2025, Last Modified: 01 Dec 2025NeurIPS 2025 WorkshopEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Online learning, Multi-armed bandits, Heteroscedastic noise, Variance-aware learning
Abstract: In this paper, we address the standard $K$-armed multi-armed bandit (MAB) with $M$ heterogeneous data sources, each exhibiting unknown and distinct noise variances, $\sigma_j^2$. We propose SOAR (Source-Optimistic Adaptive Regret Minimization), a novel algorithm that adaptively balances exploration and exploitation by jointly constructing upper confidence bounds for arm rewards and lower confidence bounds for data source variances. Our theoretical analysis establishes that SOAR achieves a regret bound of $\tilde{O}\left({\sigma^*}^2 \sum_{i=2}^K \tfrac{1}{\Delta_i}\right),$ along with a preprocessing cost that depends only on the problem parameters $\\{\sigma_j\\}_{j = 1}^M$, $K$, and grows at most logarithmically with the horizon $T$; where ${\sigma^\*}^2$ is the minimum source variance, and $\Delta_i$ denotes the suboptimality-gap of the $i$-th arm reward. The $\tilde O(.)$ notation hides the polylogarithmic factors in these problem parameters. This near-optimal performance underscores SOAR's effectiveness in dynamically managing heteroscedastic noise without incurring significant overhead. We believe this work opens a new direction for adaptively leveraging multiple heterogeneous data sources, extending beyond traditional bandit frameworks.
Submission Number: 162
Loading