Support Recovery in Sparse PCA with General Missing Data

Hanbyul Lee; Qifan Song; Jean Honorio

Support Recovery in Sparse PCA with General Missing Data

Hanbyul Lee, Qifan Song, Jean Honorio

Published: 26 Apr 2024, Last Modified: 15 Jul 2024UAI 2024 oralEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Sparse PCA, Missing Data, Support Recovery

Abstract: We analyze a sparse PCA algorithm for incomplete and noisy data without any specific model assumption on the data missing scheme. We utilize a graphical approach to characterize general missing patterns, which enables us to analyze the effect of structural properties of missing patterns on the solvability of sparse PCA problem. The sparse PCA method we focus on is a semidefinite relaxation of the $\ell_1$-regularized PCA problem. We provide theoretical justification that the support of the sparse leading eigenvector can be recovered with high probability using the algorithm, under certain conditions. The conditions involve the spectral gap between the largest and second-largest eigenvalues of the true data matrix, the magnitude of the noise, and the structural properties of the missing pattern. The concepts of algebraic connectivity and irregularity are used to describe the properties in a graphical way. We empirically justify our theorem with synthetic data analysis. We show that the SDP algorithm outperforms other sparse PCA approaches especially when the observation pattern has good structural properties. As a by-product of our analysis, we provide two theorems to handle general missing schemes, which can be applied to other problems related to incomplete data matrices.

Supplementary Material: zip

List Of Authors: Lee, Hanbyul and Song, Qifan and Honorio, Jean

Latex Source Code: zip

Signed License Agreement: pdf

Submission Number: 571

Loading