Variational Information Bottleneck for Effective Low-Resource Fine-Tuning

Rabeeh Karimi mahabadi; Yonatan Belinkov; James Henderson

Variational Information Bottleneck for Effective Low-Resource Fine-Tuning

Rabeeh Karimi mahabadi, Yonatan Belinkov, James Henderson

Published: 12 Jan 2021, Last Modified: 22 Jun 2025ICLR 2021 PosterReaders: Everyone

Keywords: Transfer learning, NLP, large-scale pre-trained language models, over-fitting, robust, biases, variational information bottleneck

Abstract: While large-scale pretrained language models have obtained impressive results when fine-tuned on a wide variety of tasks, they still often suffer from overfitting in low-resource scenarios. Since such models are general-purpose feature extractors, many of these features are inevitably irrelevant for a given target task. We propose to use Variational Information Bottleneck (VIB) to suppress irrelevant features when fine-tuning on low-resource target tasks, and show that our method successfully reduces overfitting. Moreover, we show that our VIB model finds sentence representations that are more robust to biases in natural language inference datasets, and thereby obtains better generalization to out-of-domain datasets. Evaluation on seven low-resource datasets in different tasks shows that our method significantly improves transfer learning in low-resource scenarios, surpassing prior work. Moreover, it improves generalization on 13 out of 15 out-of-domain natural language inference benchmarks. Our code is publicly available in https://github.com/rabeehk/vibert.

One-sentence Summary: We propose to use Variational Information Bottleneck to suppress irrelevant features for an effective fine-tuning of large-scale language models in low-resource scenarios.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Supplementary Material: zip

Code: [![github](/images/github_icon.svg) rabeehk/vibert](https://github.com/rabeehk/vibert)

Data: [GLUE](https://paperswithcode.com/dataset/glue), [IMDb Movie Reviews](https://paperswithcode.com/dataset/imdb-movie-reviews), [MRPC](https://paperswithcode.com/dataset/mrpc), [MultiNLI](https://paperswithcode.com/dataset/multinli), [SICK](https://paperswithcode.com/dataset/sick), [SNLI](https://paperswithcode.com/dataset/snli)

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/variational-information-bottleneck-for/code)

11 Replies

Loading