Sparse Spectral Training and Inference on Euclidean and Hyperbolic Neural Networks

Published: 05 Mar 2025, Last Modified: 10 Apr 2025SLLMEveryoneRevisionsBibTeXCC BY 4.0
Track: long paper (up to 4 pages)
Keywords: Parameter-efficient training; Pre-training; Hyperbolic Network
Abstract: The increasing GPU memory demands of large language models call for more memory-efficient training methods. Existing approaches like LoRA struggle with low-rank constraints in pre-training, while ReLoRA suffers from saddle point issues. We propose **Sparse Spectral Training (SST)**, a memory-efficient **pre-training** framework that *updates all singular values*, *selectively updates singular vectors* via multinomial sampling, and *leverages singular value decomposition (SVD) for initialization and periodic reinitialization*, reducing distortion compared to other low-rank methods. Across tasks including language generation, machine translation, and graph learning, SST outperforms existing memory-efficient training methods and is often comparable to full-rank training. On LLaMA-1.3B, SST reduces the perplexity gap to full-rank training by **97.4\%**, demonstrating its effectiveness for scalable, memory-efficient model pre-training. Our code is available at https://anonymous.4open.science/r/sparse_spectral_training-6A2C/.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Submission Number: 73
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview