- Abstract: Transferring representations from large-scale supervised tasks to downstream tasks have shown outstanding results in Machine Learning in both Computer Vision and natural language processing (NLP). One particular example can be sequence-to-sequence models for Machine Translation (Neural Machine Translation - NMT). It is because, once trained in a multilingual setup, NMT systems can translate between multiple languages and are also capable of performing zero-shot translation between unseen source-target pairs at test time. In this paper, we first investigate if we can extend the zero-shot transfer capability of multilingual NMT systems to cross-lingual NLP tasks (tasks other than MT, e.g. sentiment classification and natural language inference). We demonstrate a simple framework by reusing the encoder from a multilingual NMT system, a multilingual Encoder-Classifier, achieves remarkable zero-shot cross-lingual classification performance, almost out-of-the-box on three downstream benchmark tasks - Amazon Reviews, Stanford sentiment treebank (SST) and Stanford natural language inference (SNLI). In order to understand the underlying factors contributing to this finding, we conducted a series of analyses on the effect of the shared vocabulary, the training data type for NMT models, classifier complexity, encoder representation power, and model generalization on zero-shot performance. Our results provide strong evidence that the representations learned from multilingual NMT systems are widely applicable across languages and tasks, and the high, out-of-the-box classification performance is correlated with the generalization capability of such systems.
- Keywords: Multilingual Neural Machine Translation, Zero-shot Cross-lingual Classification
- TL;DR: Zero-shot cross-lingual transfer by using multilingual neural machine translation