The Geometry of Multilingual Language Models: An Equality LensDownload PDF

01 Mar 2023 (modified: 30 May 2023)Submitted to Tiny Papers @ ICLR 2023Readers: Everyone
Keywords: Word Embeddings, Geometry, Multilingual, Language Model
Abstract: Understanding the representations of different languages in multilingual language models is essential for comprehending their cross-lingual properties, predicting their performance on downstream tasks, and identifying any biases across languages. In our study, we analyze the geometry of three multilingual language models in Euclidean space and find that all languages are represented by unique geometries. Although languages tend to be closer according to their linguistic family, they are almost separable with languages from other families. We also introduce a Cross-Lingual Similarity Index to study the semantic similarity across languages. Our findings indicate that the representation of low-resource languages is low compared to high-resource languages.
6 Replies

Loading