Meta Sparse Principal Component Analysis

TMLR Paper3346 Authors

14 Sept 2024 (modified: 13 Nov 2024)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: We study the meta-learning for support recovery (i.e., non-zero entries of the eigenvectors) in high-dimensional Principal Component Analysis. We reduce the sufficient sample complexity in a novel task, with the information that is learned from auxiliary tasks, where a task is defined as a random Principal Component (PC) matrix with its own support. We pool data from all the tasks to execute an improper estimation of a single PC matrix, by maximising the $\ell_1$-regularised predictive covariance. With $m$ tasks for $p$-variate sub-Gaussian random vectors, we establish the sufficient sample complexity for each task to be of the order $O(\sqrt{m^{-1}\log p})$, with high probability. This is very relevant for meta-learning where there are many tasks $m = O(\log p)$, each with very few samples, i.e., $n = O(1)$, in an scenario where multi-task learning fails. For a novel task, we prove that the sufficient sample complexity of successful support recovery can be reduced to $O(\log |J|)$, under an additional constraint that the support of the novel task is a subset of the estimated support union ($J$) from the auxiliary tasks. This reduces the original sample complexity of $O(\log p)$ for learning a single task. Theoretical claims are validated with numerical simulations and the problem of true covariance estimation in brain-imaging and cancer genetics data sets are considered to validate the proposed methodology.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: The appendix PDF was put together with the main paper PDF, at the request of the Action Editor.
Assigned Action Editor: ~Mathurin_Massias1
Submission Number: 3346
Loading