Low-Rank Matrix Factorization of LSTM as Effective Model Compression

Genta Indra Winata, Andrea Madotto, Jamin Shin, Elham J. Barezi

Sep 27, 2018 (modified: Nov 16, 2018) ICLR 2019 Conference Withdrawn Submission readers: everyone
  • Abstract: Large-scale Long Short-Term Memory (LSTM) cells are often the building blocks of many state-of-the-art algorithms for tasks in Natural Language Processing (NLP). However, LSTMs are known to be computationally inefficient because the memory capacity of the models depends on the number of parameters, and the inherent recurrence that models the temporal dependency is not parallelizable. In this paper, we propose simple, but effective, low-rank matrix factorization (MF) algorithms to compress network parameters and significantly speed up LSTMs with almost no loss of performance (and sometimes even gain). To show the effectiveness of our method across different tasks, we examine two settings: 1) compressing core LSTM layers in Language Models, 2) compressing biLSTM layers of ELMo~\citep{ELMo} and evaluate in three downstream NLP tasks (Sentiment Analysis, Textual Entailment, and Question Answering). The latter is particularly interesting as embeddings from large pre-trained biLSTM Language Models are often used as contextual word representations. Finally, we discover that matrix factorization performs better in general, additive recurrence is often more important than multiplicative recurrence, and we identify an interesting correlation between matrix norms and compression performance.
  • Keywords: NLP, LSTM, Compression, Low Rank, Norm Analysis
  • TL;DR: We propose simple, but effective, low-rank matrix factorization (MF) algorithms to speed up in running time, save memory, and improve the performance of LSTMs.
0 Replies

Loading