Distributed Estimation of Sparse Covariance Matrix under Heavy-Tailed Data

Distributed Estimation of Sparse Covariance Matrix under Heavy-Tailed Data

ICLR 2026 Conference Submission15795 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: covariance estimation, high-dimensional estimation, robust estimation, distributed optimization

Abstract: In this paper, we study high-dimensional covariance matrix estimation over a network of interconnected agents, where the data are distributed and may exhibit heavy-tailed behavior. To address this challenge, we propose a new estimator that integrates the Huber loss to mitigate outliers with a non-convex regularizer to promote sparsity. To the best of our knowledge, this is the first framework that simultaneously accounts for high dimensionality, heavy tails, and distributed data in covariance estimation. We begin by analyzing a proximal gradient descent algorithm to solve this non-convex and non-globally Lipschitz smooth problem in the centralized setting to set the stage for the distributed case. In the distributed setting, where bandwidth, storage, and privacy constraints preclude agents from directly sharing raw data, we design a decentralized algorithm aligned with the centralized one, building on the principle of gradient tracking. We prove that, under mild conditions, both algorithms converge linearly to the same solution. Moreover, we establish that the resulting covariance estimates attain the oracle statistical rate in Frobenius norm, representing the state of the art for high-dimensional covariance estimation under heavy-tailed distributions. Numerical experiments corroborate our theoretical findings and demonstrate that the proposed estimator outperforms existing baselines in both estimation accuracy and robustness.

Primary Area: optimization

Submission Number: 15795

Loading