Harnessing Orthogonality to Train Low-Rank Neural Networks

Daniel Coquelin; Katharina Flügel; Marie Weiel; Nicholas Kiefer; Charlotte Debus; Achim Streit; Markus Götz

Harnessing Orthogonality to Train Low-Rank Neural Networks

Daniel Coquelin, Katharina Flügel, Marie Weiel, Nicholas Kiefer, Charlotte Debus, Achim Streit, Markus Götz

21 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Supplementary Material: pdf

Primary Area: learning theory

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: orthogonal, low rank, low-rank, svd, compression, optimization

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

TL;DR: Neural network weights converge to an orthogonal basis during training. This basis can be periodicly updated throughout training, maintaining or improving network performance.

Abstract: In the realm of neural network training, the question of what is truly being learned beyond mathematical optimization has intrigued researchers for decades. This study delves into the essence of neural network weights. By leveraging the principles of singular value decomposition, we explore the hypothesis that the orthogonal bases of the low-rank decomposition of neural network weights stabilize during training, and provide experimental evidence to support this notion. Building upon this insight, we introduce Orthogonality-Informed Adaptive Low-Rank neural network training. Our novel approach seamlessly integrates into existing training workflows with minimal accuracy loss, as demonstrated by benchmarking on various datasets and well-established network architectures. We find that, through standard tuning procedures, our method surpasses the performance of conventional training setups. Finally, we showcase the effectiveness of our tuned low-rank training procedure by applying it to a state-of-the-art transformer model for time series prediction.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 3510

Loading