Abstract: Transformer-based models have significantly
advanced the field of Natural Language Processing.
However, their large size and computational
complexity present challenges. As a
result, there is considerable interest in developing
approaches to compress these models without
compromising their performance on specific
tasks. This paper presents a comparative
study of low-rank matrix and tensor factorization
techniques for compressing Transformerbased
models. Specifically, we apply Singular
Value Decomposition (SVD) and Tensor
Train Matrix (TTM) decomposition to represent
the fully connected layers in a compressed
form. Following Hsu et al. (2022), we extend
the FWSVD approach by adding Fisher information
to the TTM decomposition and present
a novel method called FWTTM.
Our experimental results indicate that the efficiency
of these methods varies with the compression
level. Notably, integrating Fisher information
to align task and decomposition objectives
enhances the performance of factorized
with TTM transformer-based models and
encoder-decoders.
0 Replies
Loading