PartialFormer: Modeling Part Instead of Whole for Machine Translation

Tong Zheng; Huiwen Bao; Bei Li; Weiqiao Shan; Tong Xiao; JingBo Zhu

PartialFormer: Modeling Part Instead of Whole for Machine Translation

Tong Zheng, Huiwen Bao, Bei Li, Weiqiao Shan, Tong Xiao, JingBo Zhu

16 Jun 2023 (modified: 01 Dec 2023)Submitted to EMNLP 2023EveryoneRevisionsBibTeX

Submission Type: Regular Long Paper

Submission Track: Machine Translation

Keywords: Lightweight Transformer;

Abstract: The parameter redundancy problem in Transformer models has been widely acknowledged in the literature. To address this weakness, we introduce PartialFormer, a parameter-efficient Transformer architecture for machine translation. Compared to previous parameter-efficient Transformer architecture, PartialFormer modifies the modeling strategy of the feed-forward network to allow it to spare tremendous parameters while maintaining large hidden dimension. Additionally, PartialFormer applies two efficient scaling strategies, namely depth scaling and width scaling, to improve performance within a given parameter budget. To efficiently benefit from these scaling strategies, PartialFormer is further enhanced by two cost-effective modifications: 1) a head scaling strategy for efficient width scaling and 2) a residual-like attention calculation for better depth scaling. Extensive experiments on 9 translation tasks validate the effectiveness of our PartialFormer approach.

Submission Number: 2584

Loading