A Computational Study of Matrix Decomposition Methods for Compression of Pre-trained Transformers

Viktoriia A. Chekalina, Daniil Moskovskiy, Sergey Pletenev, Alexander Panchenko

20 Oct 2023OpenReview Archive Direct UploadReaders: Everyone

Abstract: Transformer-based models have significantly advanced the field of Natural Language Processing. However, their large size and computational complexity present challenges. As a result, there is considerable interest in developing approaches to compress these models without compromising their performance on specific tasks. This paper presents a comparative study of low-rank matrix and tensor factorization techniques for compressing Transformerbased models. Specifically, we apply Singular Value Decomposition (SVD) and Tensor Train Matrix (TTM) decomposition to represent the fully connected layers in a compressed form. Following Hsu et al. (2022), we extend the FWSVD approach by adding Fisher information to the TTM decomposition and present a novel method called FWTTM. Our experimental results indicate that the efficiency of these methods varies with the compression level. Notably, integrating Fisher information to align task and decomposition objectives enhances the performance of factorized with TTM transformer-based models and encoder-decoders.

0 Replies