Finding next of kin: Cross-lingual embedding spaces for related languages

Serge Sharoff

06 Nov 2021OpenReview Archive Direct UploadReaders: Everyone

Abstract: Some languages have very few NLP resources, while many of them are closely related to better resourced languages. This paper explores how the similarity between the languages can be utilised by porting resources from better to lesser resourced languages. The paper introduces a way of building a representation shared across related languages by combining cross-lingual embedding methods with a lexical similarity measure which is based on the Weighted Levenshtein Distance. One of the outcomes of the experiments is a Panslavonic embedding space for nine Balto-Slavonic languages. The paper demonstrates that the resulting embedding space helps in such applications as morphological prediction, Named Entity Recognition and genre classification.

0 Replies