Fine-Tuning Pretrained Models with NVIB for Improved Generalisation

Published: 06 Mar 2025, Last Modified: 21 Mar 2025SCSL @ ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0
Track: regular paper (up to 6 pages)
Keywords: Nonparametric Variational Information Bottleneck, Regularisation, Fine-tuning, Out-of-domain generalisation, Transformers
TL;DR: We extend Nonparametric Variational Information Bottleneck (NVIB) regularisation to fine-tuning across diverse modalities—including speech, text, graphs, and vision—demonstrating improved out-of-distribution generalisation.
Abstract: Fine-tuned pretrained attention-based models often struggle with generalisation, leading to poor performance on tasks like out-of-domain transfer, distribution shifts, and few-shot learning. This limitation is prevalent across modalities such as speech, text, graphs, and vision. Nonparametric Variational Information Bottleneck (NVIB) is an attention-based information-theoretic regulariser applicable to pretrained models that has been shown to improve generalisation. However, prior work has applied NVIB only to the text modality and without fine-tuning. We investigate whether NVIB’s ability to remove information from pretrained embeddings helps the model avoid spurious correlations with noisy and superficial features during fine-tuning. We are the first to integrate NVIB regularisation during fine-tuning across multiple diverse models and modalities. This required modifications to the architecture which enhance adaptability and stability during fine-tuning and simplify the evaluation. We found improved out-of-distribution generalisation in: speech quality assessment and language identification, text with induced attention sparsity, graph-based link prediction, and few-shot image classification.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Presenter: ~Fabio_James_Fehr1
Format: Yes, the presenting author will attend in person if this work is accepted to the workshop.
Funding: No, the presenting author of this submission does *not* fall under ICLR’s funding aims, or has sufficient alternate funding.
Submission Number: 23
Loading