Compress Reordered Transformer via Meta-LearningDownload PDF

Anonymous

17 Apr 2022 (modified: 05 May 2023)ACL ARR 2022 April Blind SubmissionReaders: Everyone
Abstract: Beyond the success in recent deep learning community, block-stacked Transformer-based models are blamed for the large scale of parameters, which limits their practical utility when deployed on devices with computational resource limitations. Though many compression methods have been proposed to cope with this inconvenience, they encounter poor generalization ability on unseen data. In this paper, we propose a novel compressor for Transformer architecture in the light of Meta-Learning (Meta-Compressor), which prunes unhelpful model weights and meanwhile largely ensures the generalization ability at the same time. Meta-Compressor is updated by measuring the performance loss before and after compression on out-of-bag data to enhance its performance and generalization ability. We conduct empirical experiments on machine translation and sentence classification tasks. In machine translation, we successfully prune the standard Transformer into 56% of its original size, and maintain 92% of the performance, while in sentence classification, our pruned Transformer demonstrates even stronger generalization on the challenging dataset.
Paper Type: long
0 Replies

Loading