Keywords: Principle Component Analysis, Transformers, Machine Learning Theory
TL;DR: Transformers can provably perform PCA.
Abstract: Transformers demonstrate significant advantage as the building block of Large Language Models. Recent efforts are devoted to understanding the learning capacities of transformers at a fundamental level. This work attempts to understand the intrinsic capacity of transformers in performing dimension reduction from complex data. Theoretically, our results rigorously show that transformers can perform Principle Component Analysis (PCA) similar to the Power Method, given a supervised pre-training phase. Moreover, we show the generalization error of transformers decays by $n^{-1/5}$ in $L_2$. Empirically, our extensive experiments on the simulated and real world high dimensional datasets justify that a pre-trained transformer can successfully perform PCA by simultaneously estimating the first $k$ eigenvectors and eigenvalues. These findings demonstrate that transformers can efficiently extract low dimensional patterns from high dimensional data, shedding light on the potential benefits of using pre-trained LLM to perform inference on high dimensional data.
Supplementary Material: zip
Primary Area: learning theory
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 13101
Loading