Abstract: We introduce TQCompressor a neural network model compression method using enhanced tensor decompositions. We propose a permutation-based improvement to Kronecker decomposition, reducing the loss in model expressivity typically associated with compression. Applied to $\mathbf{GPT-2}_{small}$, this results in the TQCompressedGPT-2 model with 81 million parameters, down from 124 million. Enhanced through multi-step knowledge distillation on 3.1% of OpenWebText, TQCompressedGPT-2 outperforms DistilGPT-2 and KnGPT-2. We made TQCompressedGPT-2 publicly available.
Loading