Constrained Density Matching and Modeling for Effective Contextualized AlignmentDownload PDF


Sep 29, 2021 (edited Oct 06, 2021)ICLR 2022 Conference Blind SubmissionReaders: Everyone
  • Keywords: Cross-lingual Alignment, Word Embeddings, NLP
  • Abstract: Multilingual representations pre-trained with monolingual data offer unmatched task performances between languages. While this has been tackled through the lens of contextualized alignments, these techniques require large parallel data, thereby leaving under-represented language communities behind. In this work, we analyze the limitations according to which previous alignments become very resource-intensive, \emph{viz.,} (i) the inability to sufficiently leverage data and (ii) that alignments are not trained properly. To address them, we present density based approaches to perform alignments, and we complement them with our validation criteria accounting for downstream task performances. Our experiments encompass 16 alignment techniques (including ours), evaluated across 6 language pairs, synthetic and 4 NLP tasks. We demonstrate that our solutions are particularly effective in the scenarios of limited and no parallel data. More importantly, we show, both theoretically and empirically, the advantages of our boostrapping procedures, by which unsupervised approaches rival supervised counterparts.
  • One-sentence Summary: We present a systematic study of cross-lingual alignments based on multilingual representations for low-resource languages, and demonstrate our solutions are particularly effective to mitigate the data scarcity issue.
4 Replies