DAP-BERT: Differentiable Architecture Pruning of BERT

Chung-Yiu Yau; Haoli Bai; Irwin King; Michael R. Lyu

DAP-BERT: Differentiable Architecture Pruning of BERT

Chung-Yiu Yau, Haoli Bai, Irwin King, Michael R. Lyu

Published: 01 Jan 2021, Last Modified: 21 Feb 2025ICONIP (1) 2021EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: The recent development of pre-trained language models (PLMs) like BERT suffers from increasing computational and memory overhead. In this paper, we focus on automatic pruning for efficient BERT architectures on natural language understanding tasks. Specifically, we propose differentiable architecture pruning (DAP) to prune redundant attention heads and hidden dimensions in BERT, which benefits both from network pruning and neural architecture search. Meanwhile, DAP can adjust itself to deploy the pruned BERT on various edge devices with different resource constraints. Empirical results show that the \(\text {BERT}_\text {BASE}\) architecture pruned by DAP achieves \(5\times \) speed-up with only a minor performance drop. The code is available at https://github.com/OscarYau525/DAP-BERT.

Loading