Semantic Fragment Similarity Representation Learning for Information Retrieval

ICLR 2026 Conference Submission25304 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Information Retrieval, Representation Learning, Sentence Embeddings, Fragment Similarity
TL;DR: We propose Semantic Fragment Similarity, a representation learning method that partitions embeddings and applies fragment-level contrastive learning, yielding semantically specialized representations, improving relevance and retrieval performance.
Abstract: We introduce Semantic Fragment Similarity (SFS), a novel similarity metric designed to enhance representation quality by partitioning embeddings into non-overlapping fragments, computing fragment level similarity, and aggregating these local scores. Conventional similarity metrics compute relevance using the global vector as a single unit. This process flattens and entangles multi-faceted semantic features and dilutes the fine-grained alignment signals crucial for accuracy. By inducing fragments to specialize in distinct semantic roles, SFS drives the substantial gains in retrieval performance across a wide range of models, tasks, and architectures when applied in both training and inference. Further, we find that a single embedding fragment trained with SFS, comprising just 12\% of the total dimensions, outperforms the entire global embedding on specific classification tasks. Ultimately, SFS can be directly integrated as a replacement for conventional similarity metrics, without architectural modifications or complex computational overhead and it opens up new avenues for building more structured and interpretable embedding models.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 25304
Loading