Is Translation Helpful? An Exploration of Cross-Lingual Transfer in Low-Resource Dialog Generation

Published: 01 Jan 2024, Last Modified: 10 Apr 2025IJCNN 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Cross-lingual transfer is important for developing high-quality chatbots in multiple languages to address the imbalanced distribution of language resources. A typical approach of cross-lingual transfer, which has been proved effective on classification tasks, is to leverage machine translation (MT) systems to utilize either the training corpus or models from high-resource languages. In this work, we investigate whether it is helpful to utilize MT for cross-lingual transfer in dialog generation tasks. We collect a benchmark dataset for the low-resource scenario, assuming access to limited Chinese dialog data in the movie domain and large amounts of English dialog from multiple domains. Experiments show that leveraging English dialog corpora can improve the naturalness, relevance, and cross-domain transferability in Chinese. However, directly using English corpora in their original form is better than translating them into Chinese. As the topics and wording habits in dialogs are strongly culture-dependent, translating them can reinforce the bias from high-resource languages. To avoid this issue and also reduce the embedding mismatch, we propose to use embedding freezing and post-alignment, which align words from different languages as translation would, but without introducing translation biases. Experiments show that embedding freezing and post-alignment can further improve generation performance. The results of the analysis together with the collected benchmark dataset are presented to draw attention to this area and support future research.
Loading