Keywords: Embedding Space Understanding, Semantic Field, Semantic Field Subspace, Hierarchical Clustering
TL;DR: We introduce a mapping between embedding spaces and the underlying semantics and a novel method for understanding the embedding spaces.
Abstract: Embedding spaces encapsulate rich information from deep learning models, with vector distances reflecting the semantic similarity between textual elements. However, their abstract nature and the computational complexity of analyzing them remain significant challenges. To address these, we introduce the concept of Semantic Field Subspace, a novel mapping that links embedding spaces with the underlying semantics. We propose \textsf{SAFARI}, a novel algorithm for \textsf{S}em\textsf{A}ntic \textsf{F}ield subsp\textsf{A}ce dete\textsf{R}m\textsf{I}nation, which leverages hierarchical clustering to discover hierarchical semantic structures, using Semantic Shifts to capture semantic changes as clusters merge, allowing for the identification of meaningful subspaces. To improve scalability, we extend Weyl's Theorem, enabling an efficient approximation of Semantic Shifts that significantly reduces computational costs. Extensive evaluations on five real-world datasets demonstrate the effectiveness of \textsf{SAFARI} in uncovering interpretable and hierarchical semantic structures. Additionally, our approximation method achieves a 15$\sim$30$\times$ speedup while maintaining minimal errors (less than 0.01), making it practical for large-scale applications. The source code is available at \url{https://anonymous.4open.science/r/Safari-C803/}.
Primary Area: interpretability and explainable AI
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 4364
Loading