Factor Model-Based Large Covariance Estimation from Streaming Data Using a Knowledge-Based Sketch Matrix

Xiao Tan; Zhaoyang Wang; Hao Qian; Jun Zhou; Peibo Duan; Dian Shen; Meng Wang; Beilun Wang

Factor Model-Based Large Covariance Estimation from Streaming Data Using a Knowledge-Based Sketch Matrix

Xiao Tan, Zhaoyang Wang, Hao Qian, Jun Zhou, Peibo Duan, Dian Shen, Meng Wang, Beilun Wang

Published: 01 Jan 2024, Last Modified: 15 May 2025CIKM 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Covariance matrix estimation is an important problem in statistics, with wide applications in finance, neuroscience, meteorology, oceanography, and other fields. However, when the data are high-dimensional and constantly generated and updated in a streaming fashion, the covariance matrix estimation faces huge challenges, including the curse of dimensionality and limited memory space. The existing methods either assume sparsity, ignoring any possible common factor among the variables, or obtain poor performance in recovering the covariance matrix directly from sketched data. To address these issues, we propose a novel method - KEEF: Knowledge-based Time and Memory Efficient Covariance Estimator in Factor Model and its extended variation. Our method leverages historical data to train a knowledge-based sketch matrix, which is used to accelerate the factor analysis of streaming data and directly estimates the covariance matrix from the sketched data. We provide theoretical guarantees, showing the advantages of our method in terms of time and space complexity, as well as accuracy. We conduct extensive experiments on synthetic and real-world data, comparing KEEF with several state-of-the-art methods, demonstrating the superior performance of our method.

Loading