Efficient Transformers via MPO-Based Low-Rank Factorization and Pruning

Sam Mikhak; Venkata Sai Gummidi; Praneeth Medepalli; Kevin Zhu

Efficient Transformers via MPO-Based Low-Rank Factorization and Pruning

Sam Mikhak, Venkata Sai Gummidi, Praneeth Medepalli, Kevin Zhu

Published: 05 Mar 2025, Last Modified: 09 Apr 2025SLLMEveryoneRevisionsBibTeXCC BY 4.0

Track: tiny / short paper (up to 4 pages)

Keywords: matrix product operators, transformer compression, low-rank factorization, pruning, resource-constrained devices, speaker identification, LibriSpeech, automatic speech recognition

TL;DR: We compress transformer architectures using matrix product operators to factorize and prune weight matrices, significantly reducing memory and computational costs while preserving performance on tasks like speaker identification.

Abstract: We explore the use of matrix product operators (MPOs) to compress transformer-based architectures. By factorizing full-rank weight matrices into tensor-train product, MPOs reduce both memory footprint and computational cost, which is critical for deployment on resource‑constrained devices. Our experiments on speaker identification using the LibriSpeech train-clean-360 subset show that MPO-based models, and even their pruned variants, maintain high performance with far fewer parameters than full‑rank transformers. We detail the mathematical principles underlying low‑rank factorization and unstructured pruning and discuss next steps for extending this approach to more complex tasks such as automatic speech recognition (ASR).

Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.

Submission Number: 79

Loading