Finding next of kin: Cross-lingual embedding spaces for related languages
Abstract: Some languages have very few NLP resources, while many of them are closely related to better resourced languages. This paper explores how the similarity between the languages can be utilised by
porting resources from better to lesser resourced languages. The paper introduces a way of building a
representation shared across related languages by combining cross-lingual embedding methods with
a lexical similarity measure which is based on the Weighted Levenshtein Distance. One of the outcomes of the experiments is a Panslavonic embedding space for nine Balto-Slavonic languages. The
paper demonstrates that the resulting embedding space helps in such applications as morphological
prediction, Named Entity Recognition and genre classification.
0 Replies
Loading