Robust Mixture Learning when Outliers Overwhelm Small Groups

Daniil Dmitriev; Rares-Darius Buhai; Stefan Tiegel; Alexander Wolters; Gleb Novikov; Amartya Sanyal; David Steurer; Fanny Yang

Robust Mixture Learning when Outliers Overwhelm Small Groups

Daniil Dmitriev, Rares-Darius Buhai, Stefan Tiegel, Alexander Wolters, Gleb Novikov, Amartya Sanyal, David Steurer, Fanny Yang

Published: 25 Sept 2024, Last Modified: 06 Nov 2024NeurIPS 2024 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: robust statistics, mixture learning, list-decodable learning, small group, outliers, mean estimation, efficient algorithms

TL;DR: Meta-algorithm for the mixture learning problem with guarantees for small groups in the presence of large additive adversarial contamination.

Abstract: We study the problem of estimating the means of well-separated mixtures when an adversary may add arbitrary outliers. While strong guarantees are available when the outlier fraction is significantly smaller than the minimum mixing weight, much less is known when outliers may crowd out low-weight clusters – a setting we refer to as list-decodable mixture learning (LD-ML). In this case, adversarial outliers can simulate additional spurious mixture components. Hence, if all means of the mixture must be recovered up to a small error in the output list, the list size needs to be larger than the number of (true) components. We propose an algorithm that obtains order-optimal error guarantees for each mixture mean with a minimal list-size overhead, significantly improving upon list-decodable mean estimation, the only existing method that is applicable for LD-ML. Although improvements are observed even when the mixture is non-separated, our algorithm achieves particularly strong guarantees when the mixture is separated: it can leverage the mixture structure to partially cluster the samples before carefully iterating a base learner for list-decodable mean estimation at different scales.

Supplementary Material: zip

Primary Area: Learning theory

Submission Number: 12284

Loading