Investigating the Use of BERT Anchors for Bilingual Lexicon Induction with Minimal SupervisionDownload PDF

Anonymous

16 Nov 2021 (modified: 05 May 2023)ACL ARR 2021 November Blind SubmissionReaders: Everyone
Abstract: This paper investigates the use of static anchors from transformer architectures for the task of Bilingual Lexicon Induction. We revisit an existing approach built around the ELMo architecture and explore the use of the methodology on the BERT family of language models. Experiments are performed and analysed for three language pairs, combining English with three target languages from very different language families, Hindi, Dutch, and Russian. Although the contextualised approach is not able to outperform the SOTA VecMap method, we find that it is easily adaptable to newer transformer models and can compete with the MUSE approach. An error analysis reveals interesting trends accross languages and shows how the method could be further improved by building on the basic hypothesis that transformer embeddings can indeed be decomposed into a static anchor and a dynamic context component. We make the code, the extracted anchors (before and after alignement) and the modified train and test sets available for use.
0 Replies

Loading