A Single Swallow Does Not Make a Summer: Understanding Semantic Structures in Embedding Spaces

25 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Embedding Space Understanding, Semantic Field, Semantic Field Subspace, Hierarchical Clustering
TL;DR: We introduce a mapping between embedding spaces and the underlying semantics and a novel method for understanding the embedding spaces.
Abstract: Embedding spaces encapsulate rich information from deep learning models, with vector distances reflecting the semantic similarity between textual elements. However, their abstract nature and the computational complexity of analyzing them remain significant challenges. To address these, we introduce the concept of Semantic Field Subspace, a novel mapping that links embedding spaces with the underlying semantics. We propose \textsf{SAFARI}, a novel algorithm for \textsf{S}em\textsf{A}ntic \textsf{F}ield subsp\textsf{A}ce dete\textsf{R}m\textsf{I}nation, which leverages hierarchical clustering to discover hierarchical semantic structures, using Semantic Shifts to capture semantic changes as clusters merge, allowing for the identification of meaningful subspaces. To improve scalability, we extend Weyl's Theorem, enabling an efficient approximation of Semantic Shifts that significantly reduces computational costs. Extensive evaluations on five real-world datasets demonstrate the effectiveness of \textsf{SAFARI} in uncovering interpretable and hierarchical semantic structures. Additionally, our approximation method achieves a 15$\sim$30$\times$ speedup while maintaining minimal errors (less than 0.01), making it practical for large-scale applications. The source code is available at \url{https://anonymous.4open.science/r/Safari-C803/}.
Primary Area: interpretability and explainable AI
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 4364
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview