DEMix Layers: Disentangling Domains for Modular Language Modeling

Anonymous

DEMix Layers: Disentangling Domains for Modular Language Modeling

Anonymous

17 Dec 2021 (modified: 05 May 2023)ACL ARR 2021 December Blind SubmissionReaders: Everyone

Abstract: We introduce a new domain expert mixture (DEMix) layer that enables conditioning a language model (LM) on the domain of the input text. A DEMix layer includes a collection of expert feedforward networks, each specialized to a domain, that makes the LM modular: experts can be mixed, added, or removed after initial training. Extensive experiments with autoregressive transformer LMs (up to 1.3B parameters) show that DEMix layers reduce test-time perplexity (especially for out-of-domain data), increase training efficiency, and enable rapid adaptation. Mixing experts during inference, using a parameter-free weighted ensemble, enables better generalization to heterogeneous or unseen domains. We also show it is possible to add experts to adapt to new domains without forgetting older ones, and remove experts to restrict access to unwanted domains. Overall, these results demonstrate benefits of domain modularity in language models.

Paper Type: long

Consent To Share Data: yes

0 Replies

Loading