TL;DR: In this work we provide upper estimates to the sample complexity of Median-of-Means at performing uniform mean estimation.
Abstract: The Median of Means (MoM) is a mean estimator that has gained popularity in the context of heavy-tailed data. In this work, we analyze its performance in the task of simultaneously estimating the mean of each function in a class $\mathcal{F}$ when the data distribution possesses only the first $p$ moments for $p \in (1,2]$. We prove a new sample complexity bound using a novel symmetrization technique that may be of independent interest. Additionally, we present applications of our result to $k$-means clustering with unbounded inputs and linear regression with general losses, improving upon existing works.
Lay Summary: In this paper, we consider the classic problem of mean estimation when the dataset contains extreme values-for instance, estimating the mean earthquake magnitude, where a few very large earthquakes can skew the results. To address the influence of extreme values, we partition the data into small buckets, compute the mean within each bucket, and take the center of these means. Since extreme values are common in modern datasets, it is important to develop robust methods for estimating the mean.
Primary Area: Theory->Learning Theory
Keywords: Median-of-Means, Uniform Convergence, Heavy-Tailed Distributions
Submission Number: 3795
Loading