Distilled embedding: non-linear embedding factorization using knowledge distillation

Vasileios Lioutas; Ahmad Rashid; Krtin Kumar; Md Akmal Haidar; Mehdi Rezagholizadeh

Distilled embedding: non-linear embedding factorization using knowledge distillation

Vasileios Lioutas, Ahmad Rashid, Krtin Kumar, Md Akmal Haidar, Mehdi Rezagholizadeh

25 Sept 2019 (modified: 22 Jun 2025)ICLR 2020 Conference Blind SubmissionReaders: Everyone

TL;DR: We present an embedding decomposition and distillation technique for NLP model compression which is state-of-the-art in machine translation and simpler than existing methods

Abstract: Word-embeddings are a vital component of Natural Language Processing (NLP) systems and have been extensively researched. Better representations of words have come at the cost of huge memory footprints, which has made deploying NLP models on edge-devices challenging due to memory limitations. Compressing embedding matrices without sacrificing model performance is essential for successful commercial edge deployment. In this paper, we propose Distilled Embedding, an (input/output) embedding compression method based on low-rank matrix decomposition with an added non-linearity. First, we initialize the weights of our decomposition by learning to reconstruct the full word-embedding and then fine-tune on the downstream task employing knowledge distillation on the factorized embedding. We conduct extensive experimentation with various compression rates on machine translation, using different data-sets with a shared word-embedding matrix for both embedding and vocabulary projection matrices. We show that the proposed technique outperforms conventional low-rank matrix factorization, and other recently proposed word-embedding matrix compression methods.

Keywords: Model Compression, Embedding Compression, Low Rank Approximation, Machine Translation, Natural Language Processing, Deep Learning

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/distilled-embedding-non-linear-embedding/code)

Original Pdf: pdf

10 Replies

Loading