Keywords: Word Embeddings, Geometry, Multilingual, Language Model
Abstract: Understanding the representations of different languages in multilingual language models is essential for comprehending their cross-lingual properties, predicting their performance on downstream tasks, and identifying any biases across languages. In our study, we analyze the geometry of three multilingual language models in Euclidean space and find that all languages are represented by unique geometries. Although languages tend to be closer according to their linguistic family, they are almost separable with languages from other families. We also introduce a Cross-Lingual Similarity Index to study the semantic similarity across languages. Our findings indicate that the representation of low-resource languages is low compared to high-resource languages.
6 Replies
Loading