- Keywords: word alignment, bert, alignment, multilingual
- TL;DR: We use representations trained without any parallel data for creating word alignments.
- Abstract: Word alignments are useful for tasks like statistical and neural machine translation (NMT) and annotation projection. Statistical word aligners perform well, as do methods that extract alignments jointly with translations in NMT. However, most approaches require parallel training data and quality decreases as less training data is available. We propose word alignment methods that require little or no parallel data. The key idea is to leverage multilingual word embeddings – both static and contextualized – for word alignment. Our multilingual embeddings are created from monolingual data only without relying on any parallel data or dictionaries. We find that traditional statistical aligners are outperformed by contextualized embeddings – even in scenarios with abundant parallel data. For example, for a set of 100k parallel sentences, contextualized embeddings achieve a word alignment F1 that is more than 5% higher (absolute) than eflomal.