Improving Fine-tuning on Low-resource Corpora with Information BottleneckDownload PDF


03 Jun 2020 (modified: 03 Jun 2020)OpenReview Anonymous Preprint Blind SubmissionReaders: Everyone
  • Keywords: fine-tuning, information bottleneck, large-scale language models
  • TL;DR: proposing to use information bottleneck to reduce over-fitting when fine-tuning large-scale language models on low-resource datasets.
  • Abstract: Large-scale pre-trained language models act as general-purpose feature extractors, but not all the features are relevant for a given target task. This can cause problems in low-resource scenarios, where fine-tuning such large-scale models often over-fits on the small training set. We propose to use the information bottleneck principle to improve generalization in this scenario. We apply the variational information bottleneck method to remove task-irrelevant and redundant features from sentence embeddings during the fine-tuning of BERT. Evaluation on seven low-resource datasets for different tasks shows that our method significantly improves transfer learning in low-resource scenarios and obtains better generalization on 11 out of 13 out-of-domain textual entailment datasets.
0 Replies