Information-theoretic Limits of Online Classification with Noisy Labels

Changlong Wu; Ananth Grama; Wojciech Szpankowski

Information-theoretic Limits of Online Classification with Noisy Labels

Changlong Wu, Ananth Grama, Wojciech Szpankowski

Published: 25 Sept 2024, Last Modified: 06 Nov 2024NeurIPS 2024 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Online classification, noisy label, pairwise testing, Hellinger divergence, Le Cam-Birge testing

TL;DR: We provide nearly matching lower and upper bounds for online classification with noisy labels across a wide range of hypothesis classes and noise mechanisms, using the Hellinger gap of the induced noisy label distributions.

Abstract: We study online classification with general hypothesis classes where the true labels are determined by some function within the class, but are corrupted by *unknown* stochastic noise, and the features are generated adversarially. Predictions are made using observed *noisy* labels and noiseless features, while the performance is measured via minimax risk when comparing against *true* labels. The noisy mechanism is modeled via a general noisy kernel that specifies, for any individual data point, a set of distributions from which the actual noisy label distribution is chosen. We show that minimax risk is *tightly* characterized (up to a logarithmic factor of the hypothesis class size) by the *Hellinger gap* of the noisy label distributions induced by the kernel, *independent* of other properties such as the means and variances of the noise. Our main technique is based on a novel reduction to an online comparison scheme of two hypotheses, along with a new *conditional* version of Le Cam-Birgé testing suitable for online settings. Our work provides the first comprehensive characterization of noisy online classification with guarantees that apply to the *ground truth* while addressing *general* noisy observations.

Primary Area: Learning theory

Submission Number: 4178

Loading