Learning Exponential Families from Truncated Samples

Published: 21 Sept 2023, Last Modified: 11 Jan 2024NeurIPS 2023 posterEveryoneRevisionsBibTeX
Keywords: truncated statistics, robustness, exponential families, extrapolation
TL;DR: We provide the first efficient algorithm for estimating parameters of general exponential families in high dimensions using samples truncated to very general sets.
Abstract: Missing data problems have many manifestations across many scientific fields. A fundamental type of missing data problem arises when samples are \textit{truncated}, i.e., samples that lie in a subset of the support are not observed. Statistical estimation from truncated samples is a classical problem in statistics which dates back to Galton, Pearson, and Fisher. A recent line of work provides the first efficient estimation algorithms for the parameters of a Gaussian distribution and for linear regression with Gaussian noise. In this paper we generalize these results to log-concave exponential families. We provide an estimation algorithm that shows that \textit{extrapolation} is possible for a much larger class of distributions while it maintains a polynomial sample and time complexity on average. Our algorithm is based on Projected Stochastic Gradient Descent and is not only applicable in a more general setting but is also simpler and more efficient than recent algorithms. Our work also has interesting implications for learning general log-concave distributions and sampling given only access to truncated data.
Supplementary Material: pdf
Submission Number: 3705
Loading