Keywords: emergent communication, fine-tuning, unsupervised machine translation, multingual language models, mBART, multimodal NLP, multimodal pretraining, machine translation
TL;DR: We describe an approach of using emergent communication to fine-tune large pretrained langauge models, with suggestive pilot results for unsupervised translation.
Abstract: It has recently been argued that the currently dominant paradigm in NLP of pretraining on text-only corpora will not yield robust natural language understanding systems. One strain of this argumentation highlights the need for grounded, goal-oriented, and interactive language learning. In this position paper, we articulate how Emergent Communication (EC) can be used in conjunction with large pretrained language models as a `Fine-Tuning' (FT) step (hence, EC-FT) in order to provide them with supervision from such learning scenarios. We discuss methodological issues and difficulties with making this work, and then illustrate the overall idea with a case study in unsupervised machine translation, before concluding with a discussion on the relation to multimodal pretraining.