Transformers Compression: A Study of Matrix Decomposition Methods Using Fisher Information

Sergey Pletenev; Daniil Moskovskiy; Viktoria Chekalina; Mikhail Seleznyov; Sergey Zagoruyko; Alexander Panchenko

Transformers Compression: A Study of Matrix Decomposition Methods Using Fisher Information

Sergey Pletenev, Daniil Moskovskiy, Viktoria Chekalina, Mikhail Seleznyov, Sergey Zagoruyko, Alexander Panchenko

Published: 01 Jan 2023, Last Modified: 21 May 2025AIST 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Transformer models have been a breakthrough in Natural Language Processing. However, the performance of these models comes with their enormous size, limiting options for their deployment. Facing this issue, in this paper, we compare different compression techniques, such as low-rank matrix and tensor factorization, for compressing these heavy layers. We focus on Singular Value Decomposition (SVD) and Tensor Train Matrix Decomposition (TTM) and extend previous work [10] by incorporating Fisher information into the TTM, introducing a novel approach which we call FWTTM.

Loading