An Efficient and Reliable Tolerance-Based Algorithm for Principal Component Analysis

Michael Yeh; Ming Gu

An Efficient and Reliable Tolerance-Based Algorithm for Principal Component Analysis

Michael Yeh, Ming Gu

Published: 28 Jan 2022, Last Modified: 13 Feb 2023ICLR 2022 SubmittedReaders: Everyone

Keywords: principal component analysis, dimensionality reduction, data compression

Abstract: Principal component analysis (PCA) is an important method for dimensionality reduction in data science and machine learning. But, it is expensive for large matrices when only a few principal components are needed. Existing fast PCA algorithms typically assume the user will supply the number of components needed, but in practice, they may not know this number beforehand. Thus, it is important to have fast PCA algorithms depending on a tolerance. For $m\times n$ matrices where a few principal components explain most of the variance in the data, we develop one such algorithm that runs in $O(mnl)$ time, where $l\ll \min(m,n)$ is a small multiple of the number of principal components. We provide approximation error bounds that are within a constant factor away from optimal and demonstrate its utility with data from a variety of applications.

4 Replies

Loading