Mixture-of-Linguistic-Experts Adapters for Improving and Interpreting Pre-trained Language Models

Raymond Li; Gabriel Murray; Giuseppe Carenini

Mixture-of-Linguistic-Experts Adapters for Improving and Interpreting Pre-trained Language Models

Raymond Li, Gabriel Murray, Giuseppe Carenini

Published: 07 Oct 2023, Last Modified: 01 Dec 2023EMNLP 2023 FindingsEveryoneRevisionsBibTeX

Submission Type: Regular Long Paper

Submission Track: Semantics: Lexical, Sentence level, Document Level, Textual Inference, etc.

Submission Track 2: Efficient Methods for NLP

Keywords: adapters, graph neural networks, parameter-efficient fine-tuning, interpretability, dependency trees

TL;DR: In this work, we propose a method that selects adapter modules encoding linguistic-graphs based on the values from learned Gumbel-Softmax gates.

Abstract: In this work, we propose a method that combines two popular research areas by injecting linguistic structures into pre-trained language models in the parameter-efficient fine-tuning (PEFT) setting. In our approach, parallel adapter modules encoding different linguistic structures are combined using a novel Mixture-of-Linguistic-Experts architecture, where Gumbel-Softmax gates are used to determine the importance of these modules at each layer of the model. To reduce the number of parameters, we first train the model for a fixed small number of steps before pruning the experts based on their important scores. Our experiment results with three different pre-trained models show that our approach can outperform state-of-the-art PEFT methods with a comparable number of parameters. In addition, we provide additional analysis to examine the experts selected by each model at each layer to provide insights for future studies.

Submission Number: 798

Loading