Unpacking the Suitcase of Semantic Similarity

Hendrik Luuk

Unpacking the Suitcase of Semantic Similarity

Hendrik Luuk

03 Sept 2025 (modified: 18 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: embedding model, text embedding, semantics, language model, entailment, contextual fingerprint, contextual similarity, cosine similarity

TL;DR: We derive analytic expressions for semantic entailment similarity and demonstrate that text embeddings represent mostly contextual as opposed to entailment similarity.

Abstract: Retrieval-augmented generation (RAG) has become a de facto standard for reducing factual inaccuracies in LLM-generated responses and it is generally accepted that cosine similarity between two text embeddings is a state-of-the-art measure of semantic similarity. In practice, however, there is a disconnect between the expectation to retrieve semantically highly relevant text and the kinds of information text embeddings actually represent. The aim of this study is to unpack the generic term "semantic similarity" into empirically distinguishable components and investigate how they factor into the cosine similarity of text embeddings. We derive analytic expressions for semantic entailment similarity on concept, predicate and proposition levels based on a previously proposed logical framework of conceptual semantics. This enables us to create a benchmark dataset of concepts and propositions with quantitatively characterized semantic entailment relationships. We train linear projections from the text embeddings of 15 state-of-the-art embedding models to semantic entailment space and assess the deviation of semantic entailment cosine similarity estimates from the ground truth. Next, we identify proposition entailment similarity categories that are relatively more difficult to handle for low than high performing models. As a complementary approach, regression modeling is used to demonstrate the predictive value of symbolic similarity, contextual similarity and entailment similarity on the cosine similarity of text embeddings. Both approaches are found to converge on a small set of models that are significantly better at semantic entailment estimation than the rest. We conclude that the majority of variation in cosine similarity of text embeddings is due to contextual similarity as opposed to entailment, and propose using the term "contextual similarity" instead of the ambiguous "semantic similarity" when referring to cosine similarity estimates from text embedding models. We also propose the term "contextual fingerprint" to capture the intuition behind text embeddings instead of the potentially misleading "semantic embedding".

Primary Area: interpretability and explainable AI

Supplementary Material: zip

Submission Number: 1725

Loading