Weight Squeezing: Reparameterization for Model CompressionDownload PDF

Anonymous

30 May 2020 (modified: 04 Jun 2020)OpenReview Anonymous Preprint Blind SubmissionReaders: Everyone
  • Keywords: distillation, compressing, text classification, knowledge distillation, Transformer, BERT
  • TL;DR: Knowledge transfer from a pre-trained teacher model by learning the mapping from its weights to smaller student model weights, without significant loss of model accuracy.
  • Abstract: In this work, we present a novel approach for simultaneous knowledge transfer and model compression called Weight Squeezing. In this method, we perform knowledge transfer from a pre-trained teacher model by learning the mapping from its weights to smaller student model weights, without significant loss of model accuracy. We apply Weight Squeezing combined with Knowledge Distillation to a pre-trained text classification model and compare it to various knowledge transfer and model compression methods on several downstream text classification tasks. We observe that our approach produces competitive results. We also report that for some experiments, Weight Squeezing without Knowledge Distillation outperforms other baselines.
0 Replies

Loading