Online Linear Classification with Massart Noise

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: We study the task of online learning in the presence of Massart noise. Specifically, instead of assuming that the online adversary chooses an arbitrary sequence of labels, we assume that the context $\boldsymbol{x}$ is selected adversarially but the label $y$ presented to the learner disagrees with the ground-truth label of $\boldsymbol{x}$ with unknown probability {\em at most} $\eta$. We focus on the fundamental class of $\gamma$-margin linear classifiers and present the first computationally efficient algorithm that achieves mistake bound $\eta T + o(T)$. We point out that the mistake bound achieved by our algorithm is qualitatively tight for computationally efficient algorithms; this follows from the fact that, even in the offline setting, achieving 0-1 error better than $\eta$ requires super-polynomial time under standard complexity assumptions. We extend our online learning model to a $k$-arm contextual bandit setting where the rewards---instead of satisfying commonly used realizability assumptions---are consistent, in expectation, with some linear ranking function with weight vector $\boldsymbol{w}^\ast$. Given a list of contexts $\boldsymbol{x}_1,\ldots \boldsymbol{x}_k$, if $\boldsymbol{w}^*\cdot \boldsymbol{x}_i > \boldsymbol{w}^* \cdot \boldsymbol{x}_j$, the expected reward of action $i$ must be larger than that of $j$ by at least $\Delta$. We use our Massart online learner to design an efficient bandit algorithm that obtains expected reward at least $(1-1/k)~ \Delta T - o(T)$ bigger than choosing a random action at every round.
Lay Summary: In the online classification setting, there is no fixed training set; instead, an adversary (or “nature”) sends us one feature vector at a time. We must predict its label on the spot, immediately see whether we were right, and then repeat—while the adversary is free to craft each new example after observing every mistake we have ever made. We investigate this sequential scenario under Massart noise, where each revealed label disagrees with the ground-truth label with some unknown probability $\eta<1/2$. Focusing on γ-margin linear classifiers, we show that a simple margin-based update rule makes at most $\eta T + o(T)$ mistakes over T rounds, thereby achieving the best-known error guarantees that were previously known only in the easier offline setting. We further extend the approach to a $k$-arm contextual bandit problem whose rewards respect the same linear-margin structure, designing an algorithm whose expected cumulative reward beats a uniformly random policy by at least $(1 − 1/k) \Delta T − o(T)$, where $\Delta$ is the margin gap between optimal and sub-optimal actions. Collectively, our results demonstrate that near-optimal robustness to Massart noise can be attained in real time without sacrificing either computational efficiency or regret guarantees.
Primary Area: Theory->Online Learning and Bandits
Keywords: Online Learning, Massart noise
Submission Number: 11733
Loading