Optimizing Transformers with Approximate Computing for Faster, Smaller and more Accurate NLP Models

Amrit Nagarajan; Sanchari Sen; Jacob R. Stevens; Anand Raghunathan

Optimizing Transformers with Approximate Computing for Faster, Smaller and more Accurate NLP Models

Amrit Nagarajan, Sanchari Sen, Jacob R. Stevens, Anand Raghunathan

28 Sept 2020 (modified: 22 Jun 2025)ICLR 2021 Conference Blind SubmissionReaders: Everyone

Keywords: BERT, Transformer, NLP, Efficient, Faster, Smaller, Accurate

Abstract: Transformer models have garnered a lot of interest in recent years by delivering state-of-the-art performance in a range of Natural Language Processing (NLP) tasks. However, these models can have over a hundred billion parameters, presenting very high computational and memory requirements. We address this challenge through Approximate Computing, specifically targeting the use of Transformers in NLP tasks. Transformers are typically pre-trained and subsequently specialized for specific tasks through transfer learning. We observe that pre-trained Transformers are often over-parameterized for several downstream NLP tasks and propose a framework to create smaller and faster models with comparable accuracy. The key cornerstones of the framework are a Significance Analysis (SA) method to identify important components in a pre-trained Transformer for a given task, and techniques to approximate the less significant components. Our framework can be adapted to produce models that are faster, smaller and/or more accurate, depending on the user's constraints. We apply our framework to multiple Transformer models and different downstream tasks, including previously proposed optimized models like DistilBERT and Q8BERT. We demonstrate that our framework produces models that are up to 4$\times$ faster and up to 14$\times$ smaller (with less than 0.5% relative accuracy degradation), or up to 5.5% more accurate with simultaneous model size and speed improvements of up to 9.8$\times$ and 2.9$\times$, respectively.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

One-sentence Summary: We develop a framework to create smaller, faster and more accurate NLP Models

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/optimizing-transformers-with-approximate/code)

Reviewed Version (pdf): https://openreview.net/references/pdf?id=afReVeC3Cd

12 Replies

Loading