The geometry of sentence embedding spaces is not indicative of their performance: A study of three variations of sentence representation

The geometry of sentence embedding spaces is not indicative of their performance: A study of three variations of sentence representation

ACL ARR 2025 February Submission2171 Authors

14 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Transformer models learn to encode and decode an input text, and produce contextual token embeddings as a side-effect. The mapping from language into the embedding space maps words expressing similar concepts onto points that are close in the space. In practice, the reverse implication is also assumed: words corresponding to points that are close in this space are similar or related. Does this closeness in the embedding space extend to shared properties for sentence embeddings? We compute sentence embeddings in three ways: as the averaged token embeddings, as the embedding of the special [CLS] token, and as the embedding of a random token from the sentence. We explore whether sentence embedding variations that are close in this space also have similar performance on morphology, syntax, semantic, discourse, and reasoning tasks, or whether their relative position does not offer useful clues about their relative performance and the type of linguistic information they encode. The results show that each of the four transformer models tested -- BERT, RoBERTa, DeBERTa, Electra -- have their own embeddings profile, but shallow differences or commonalities between the three types of embeddings are not predictive of their performance on specific tasks. In an extreme case, Electra's [CLS] sentence embeddings and averaged token embeddings are superficially almost orthogonal, but both of them encode information about sentence chunk structure in the same way. RoBERTa's very similar sentence embeddings have very different performance on linguistic tasks. The embedding of a random token in a sentence works surprisingly well as a proxy for the sentence embedding.

Paper Type: Long

Research Area: Interpretability and Analysis of Models for NLP

Research Area Keywords: probing; robustness; feature attribution;

Contribution Types: Model analysis & interpretability

Languages Studied: English, French, Italian, Romanian

Submission Number: 2171

Loading