Weight Squeezing: Reparameterization for Knowledge Transfer and Model Compression

Anonymous

Weight Squeezing: Reparameterization for Knowledge Transfer and Model Compression

Anonymous

16 Nov 2021 (modified: 05 May 2023)ACL ARR 2021 November Blind SubmissionReaders: Everyone

Abstract: In this work, we present a novel approach to simultaneous knowledge transfer and model compression called \textbf{Weight Squeezing}. With this method, we perform knowledge transfer from a teacher model \textbf{by learning the mapping from its weights to smaller student model weights}.We applied Weight Squeezing to a pre-trained text classification model based on a BERT-Medium model. We compared our method to various other knowledge transfer and model compression methods using the GLUE multitask benchmark. We observed that our approach produces better results while being significantly faster than other methods for training student models.We also proposed a variant of Weight Squeezing called Gated Weight Squeezing, in which we combined fine-tuning a small BERT model and learning mapping from larger BERT weights. We showed that, in most cases, fine-tuning a BERT model with Gated Weight Squeezing can outperform plain fine-tuning.

0 Replies

Loading