E-LANG: Energy-based Joint Inferencing of Super and Swift Language Models

Mohammad Akbari; Amin Banitalebi-Dehkordi; Yong Zhang

E-LANG: Energy-based Joint Inferencing of Super and Swift Language Models

Mohammad Akbari, Amin Banitalebi-Dehkordi, Yong Zhang

29 Sept 2021 (modified: 13 Feb 2023)ICLR 2022 Conference Desk Rejected SubmissionReaders: Everyone

Keywords: energy-based models, dynamic inference, joint language models, super model optimization, NLP, BERT, T5

Abstract: Building very large and highly capable language models has been a trend in the past several years. Despite their great performance, they incur a high computational cost. A common solution is to apply model compression or choose light-weight architectures, which often need a separate fixed-size model for each desirable computational budget, and may lose performance in case of heavy compression. This paper proposes an effective dynamic inference approach, which distributes the inference between large accurate Super-models and light-weight Swift models. To this end, a decision making module routes the incoming samples to one of the two models based on the energy characteristics of the representations in the latent space. The proposed approach is easily adoptable and architecture agnostic. As such, it can be applied to black-box pre-trained models without a need for architectural manipulations, careful reassembling of modules, or re-training. Unlike existing methods that are for the most part only applicable to encoder-only backbones and classification tasks, our method also works for encoder-decoder structures and sequence-to-sequence tasks such as translation. The performance of the proposed Energy-based joint inferencing of LANGuage models, E-LANG, is verified through an extensive set of experiments with T5 and BERT architectures on GLUE, SuperGLUE, and WMT benchmarks. In particular, we outperform T5-11B with an average computations speed-up of 3.3X on GLUE and 2.9X on SuperGLUE. We also achieve BERT-based SOTA (state-of-the-art) on GLUE with 3.2X less computations. Code is available in the supplementary materials.

One-sentence Summary: In this paper, we present E-LANG, an energy-based joint inference approach, which combines Super and Swift language models for achieving efficient inference without sacrificing the accuracy.

Supplementary Material: zip

1 Reply

Loading