Abstract: The knowledge encapsulated in a model is the core factor determining its final performance on downstream tasks. Much research in NLP has focused on efficient methods for storing and adapting different types of knowledge, e.g., in dedicated modularized structures, and on how to effectively combine these modules, e.g., via parameter averaging at test time. However, given the many possible options in composing knowledge, a thorough understanding of the mechanisms involved is missing, and hence it remains unclear which strategies to utilize. In this work, we address this research gap by proposing a novel framework for zero-shot module composition, which encompasses existing and some novel variations for selecting, weighting, and combining parameter modules under a single unified notion. Focusing on the scenario of domain knowledge and adapter layers, our framework provides a systematic unification of concepts, allowing us to conduct the first comprehensive benchmarking study on various zero-shot knowledge composition strategies. In particular, we test two module combination methods (parameter averaging, output ensembling), and five selection and weighting strategies (uniform, and based on entropy, domain prior, TF-IDF, and semantic similarity) for their effectiveness and efficiency on 21 training and 10 evaluation domains across three models. Our results highlight the efficacy of ensembling, but also hint at the power of simple though often-ignored weighting methods. We further conduct various in-depth analyses, that, for instance, allow us to understand the role of weighting vs. top-k selection, and we show that, to a certain extent, the performance of adapter composition can even be predicted.
Paper Type: long
Research Area: Efficient/Low-Resource Methods for NLP
Contribution Types: NLP engineering experiment, Approaches low compute settings-efficiency, Publicly available software and/or pre-trained models
Languages Studied: english
0 Replies
Loading