Pruning Residual Networks in Multilingual Neural Machine Translation to Improve Zero-Shot Translation

Published: 01 Jan 2024, Last Modified: 20 May 2025NLPCC (3) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: A promising advantage of multilingual neural machine translation is that it could directly translate language pairs not included in the supervised training data, i.e., zero-shot translation. However, this technology often suffers from capturing spurious correlations to those language pairs seen in training, which easily falls into the trap of off-target translation and leads to poor translation quality. In this study, we present a novel explanation for the off-target phenomenon and investigate the influence of the encoder component on zero-shot translation. Using this as inspiration, we systematically analyze from a decoupling perspective and reveal how the model erroneously captures spurious correlations. Our results show that there is redundancy in the components of the encoder for zero-shot translation. Pruning the encoder structure significantly improves performance in the zero-shot directions while maintaining the quality of translation in the supervised directions. Extensive experiments conducted on three challenging multilingual datasets demonstrate that our proposed model achieves comparable or even superior performance than the strong multilingual model baseline in zero-shot directions.
Loading