Mean Estimation from Coarse Data: Characterizations and Efficient Algorithms

Alkis Kalavasis; Anay Mehrotra; Manolis Zampetakis; Felix Zhou; Ziyu Zhu

Mean Estimation from Coarse Data: Characterizations and Efficient Algorithms

Alkis Kalavasis, Anay Mehrotra, Manolis Zampetakis, Felix Zhou, Ziyu Zhu

Published: 26 Jan 2026, Last Modified: 26 Feb 2026ICLR 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: high dimensional statistics, algorithmic statistics, computational learning theory, coarse observations, mean estimation, linear regression, friction

TL;DR: We study Gaussian mean estimation from coarse observations under convex partitions. We give efficient algorithms and characterize identifiability.

Abstract: Coarse data arise when learners observe only partial information about samples; namely, a set containing the sample rather than its exact value. This occurs naturally through measurement rounding, sensor limitations, and lag in economic systems. We study Gaussian mean estimation from coarse data, where each true sample $x$ is drawn from a $d$-dimensional Gaussian distribution with identity covariance, but is revealed only through the set of a partition containing $x$. When the coarse samples, roughly speaking, have ``low'' information, the mean cannot be uniquely recovered from observed samples (i.e., the problem is not *identifiable*). Recent work by Fotakis et al. (2021) established that *sample*-efficient mean estimation is possible when the unknown mean is *identifiable* and the partition consists of only *convex* sets. Moreover, they showed that without convexity, mean estimation becomes NP-hard. However, two fundamental questions remained open: 1. When is the mean identifiable under convex partitions? 2. Is *computationally* efficient estimation possible under identifiability and convex partitions? This work resolves both questions. We provide a geometric characterization of when a convex partition is identifiable, showing it depends on whether the convex sets form ``slabs'' in a direction. Second, we give the first polynomial-time algorithm for finding $\varepsilon$-accurate estimates of the Gaussian mean given coarse samples from an unknown convex partition, matching the optimal $\widetilde{O}(d/\varepsilon^2)$ sample complexity. Our results have direct applications to robust machine learning, particularly robustness to observation rounding. As a concrete example, we derive a sample- and computationally- efficient algorithm for linear regression with market friction, a canonical problem in using ML in economics, where exact prices are unobserved and one only sees a range containing the price (Rosett, 1959).

Primary Area: learning theory

Submission Number: 2210

Loading