AEQA-NAT : Adaptive End-to-end Quantization Alignment Training Framework for Non-autoregressive Machine Translation

Xiangyu Qu; guojing liu; Liang Li

AEQA-NAT : Adaptive End-to-end Quantization Alignment Training Framework for Non-autoregressive Machine Translation

Xiangyu Qu, guojing liu, Liang Li

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Non-autoregressive Transformers (NATs) have garnered significant attention due to their efficient decoding compared to autoregressive methods. However, existing conditional dependency modeling schemes based on masked language modeling introduce a *training-inference gap* in NATs. For instance, while NATs sample target words during training to enhance input, this condition cannot be met during inference, and simply annealing the sampling rate to zero during training leads to model performance degradation. We demonstrate that this *training-inference gap* prevents NATs from fully realizing their potential. To address this, we propose an adaptive end-to-end quantization alignment training framework, which introduces a semantic consistency space to adaptively align NAT training, eliminating the need for target information and thereby bridging the *training-inference gap*.Experimental results demonstrate that our method outperforms most existing fully NAT models, delivering performance on par with Autoregressive Transformer (AT) while being 17.0 times more efficient in inference.

Lay Summary: Conditional masked language modeling is one of the common methods for optimizing non-autoregressive machine translation models. We sought to explore the impact of retaining the target sequence information introduced during the training phase during the inference phase. Surprisingly, we found that maintaining the same paradigm during inference as during training significantly enhances the translation quality of non-autoregressive machine translation models. To achieve consistent input methods between the training and inference phases, we employed the concept of vector quantization alignment to ensure that the decoding process does not rely on target sequence information. Our findings are of significant importance for understanding and exploring the text generation capabilities of non-autoregressive models, providing a new avenue for further investigation.

Primary Area: Deep Learning->Sequential Models, Time series

Keywords: Machine Translation, Parallel decoding, Vector Quantization

Submission Number: 15652

Loading