NASH: A Simple Unified Framework of Structured Pruning for Accelerating Encoder-Decoder Language Models

Jongwoo Ko; Seungjoon Park; Yujin Kim; Sumyeong Ahn; Du-Seong Chang; Euijai Ahn; Se-Young Yun

NASH: A Simple Unified Framework of Structured Pruning for Accelerating Encoder-Decoder Language Models

Jongwoo Ko, Seungjoon Park, Yujin Kim, Sumyeong Ahn, Du-Seong Chang, Euijai Ahn, Se-Young Yun

Published: 07 Oct 2023, Last Modified: 01 Dec 2023EMNLP 2023 FindingsEveryoneRevisionsBibTeX

Submission Type: Regular Long Paper

Submission Track: Efficient Methods for NLP

Submission Track 2: Machine Learning for NLP

Keywords: Model Compression, Structured Pruning, Encoder-Decoder Language Model

TL;DR: We conduct an empirical analysis to investigate the consideration for applying structured pruning to encoder-decoder language models and propose a new framework, NASH, considering the insight gained from the analysis.

Abstract: Structured pruning methods have proven effective in reducing the model size and accelerating inference speed in various network architectures such as Transformers. Despite the versatility of encoder-decoder models in numerous NLP tasks, the structured pruning methods on such models are relatively less explored compared to encoder-only models. In this study, we investigate the behavior of the structured pruning of the encoder-decoder models in the decoupled pruning perspective of the encoder and decoder component, respectively. Our findings highlight two insights: (1) the number of decoder layers is the dominant factor of inference speed, and (2) low sparsity in the pruned encoder network enhances generation quality. Motivated by these findings, we propose a simple and effective framework, NASH, that narrows the encoder and shortens the decoder networks of encoder-decoder models. Extensive experiments on diverse generation and inference tasks validate the effectiveness of our method in both speedup and output quality.

Submission Number: 2627

Loading