The Power of Iterative Filtering for Supervised Learning with (Heavy) Contamination

Adam Klivans; Konstantinos Stavropoulos; Kevin Tian; Arsen Vasilyan

The Power of Iterative Filtering for Supervised Learning with (Heavy) Contamination

Adam Klivans, Konstantinos Stavropoulos, Kevin Tian, Arsen Vasilyan

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 spotlightEveryoneRevisionsBibTeXCC BY 4.0

Keywords: robust learning, malicious noise, contamination, outlier removal

TL;DR: We provide a wide range of nearly optimal guarantees for several fundamental problems in robust supervised learning based on a single iterative polynomial filtering algorithm.

Abstract: Inspired by recent work on learning with distribution shift, we give a general outlier removal algorithm called *iterative polynomial filtering* and show a number of striking applications for supervised learning with contamination: (1) We show that any function class that can be approximated by low-degree polynomials with respect to a hypercontractive distribution can be efficiently learned under bounded contamination (also known as *nasty noise*). This is a surprising resolution to a longstanding gap between the complexity of agnostic learning and learning with contamination, as it was widely believed that low-degree approximators only implied tolerance to label noise. (2) For any function class that admits the (stronger) notion of sandwiching approximators, we obtain near-optimal learning guarantees even with respect to heavy additive contamination, where far more than $1/2$ of the training set may be added adversarially. Prior related work held only for regression and in a list-decodable setting. (3) We obtain the first efficient algorithms for tolerant testable learning of functions of halfspaces with respect to any fixed log-concave distribution. Even the non-tolerant case for a single halfspace in this setting had remained open. These results significantly advance our understanding of efficient supervised learning under contamination, a setting that has been much less studied than its unsupervised counterpart.

Primary Area: Theory (e.g., control theory, learning theory, algorithmic game theory)

Submission Number: 23131

Loading