Filter-GLAT: Filter Glanced Decoder Output for Non-autoregressive Transformer

Zichun Wang; Huanran Zheng; Xiaoling Wang

Filter-GLAT: Filter Glanced Decoder Output for Non-autoregressive Transformer

Zichun Wang, Huanran Zheng, Xiaoling Wang

Published: 01 Jan 2024, Last Modified: 15 Dec 2024APWeb/WAIM (1) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Non-autoregressive machine translation model has achieved significantly faster inference speed compared to the autoregressive translation model. However, its translation quality is degraded compared to the autoregressive translation model. Despite numerous advanced methods are proposed to improve the translation quality of the non-autoregressive translation model, achieving the desired trade-off between quality and efficiency is difficult. In this paper, a Filter Glanced Transformer, named Filter-GLAT, is proposed to tackle this problem. It first refines the glance sampling learning strategy, followed by adopting the Filter learning strategy during training, substantially enhancing the translation quality. As for the inference speed, Filter-GLAT generates predictions with only a single decoding pass, maintaining high speed. Moreover, the Filter learning strategy helps the model narrow the gap between training and inference procedures by modifying the training process. Extensive experiments over translation benchmarks (WMT’14 EN-DE and WMT’16 EN-RO) demonstrate that Filter-GLAT almost strikes the best balance between translation quality and speed.

Loading