Adapters for Resource-Efficient Deployment of NLU Models

Anonymous

Adapters for Resource-Efficient Deployment of NLU Models

Anonymous

04 Mar 2022 (modified: 05 May 2023)Submitted to NLP for ConvAIReaders: Everyone

Keywords: dialog systems, deployment, efficient models, green ai

TL;DR: We show that the Adapter framework can save massive amounts of memory in the deployment of NLP models, with a performance compared to that of vanilla BERT.

Abstract: Modern transformer models such as BERT are huge and expensive to deploy in practical applications. In environments such as commercial chatbot-as-a-service platforms that deploy many NLP models in parallel, less powerful models with a smaller number of parameters are often used to keep deployment costs down. Also, in times of climate change and scarce resources, the deployment of many huge models is no longer adequate. This paper proposes the BERT+Adapter architecture for hosting many models and saving significant amounts of (GPU) memory. We further demonstrate this approach using the example of intent detection for dialog systems. Many task-specific adapters can share one large Transformer model with the adapter framework. To deploy 100 NLU models, we calculate memory usage of 1 GB for the proposed BERT+Adapter architecture, compared to 41.78 GB for a BERT-only architecture. Also, we can show that the training time of the BERT+Adapter architecture is on average 14.43 times shorter than that of vanilla BERT. Furthermore, we demonstrate that the accuracy of intent detection of BERT+Adapter is comparable to a vanilla BERT architecture.

0 Replies

Loading