IsoScore: Measuring the Uniformity of Vector Space UtilizationDownload PDF

29 Sept 2021 (modified: 22 Oct 2023)ICLR 2022 Conference Withdrawn SubmissionReaders: Everyone
Keywords: Contextualized Word Embeddings, Isotropy, Natural Language Processing
Abstract: The recent success of distributed word representations has led to an increased interest in analyzing the properties of their spatial distribution. Current metrics suggest that contextualized word embedding models do not uniformly utilize all available dimensions when embedding tokens in vector space. Previous works argue that encouraging isotropy in embedding space corresponds to improved performance on downstream tasks. However, existing metrics---average random cosine similarity, for example---do not properly measure isotropy and tend to obscure the true spatial distribution of point clouds. To address this issue, we propose IsoScore: a novel metric that quantifies the degree to which a point cloud uniformly utilizes the ambient vector space. We demonstrate that IsoScore has several desirable properties, such as mean invariance and direct correspondence to the number of dimensions used that existing scores do not possess. Furthermore, IsoScore is conceptually intuitive, making it well suited for analyzing the distribution of arbitrary point clouds in vector space, not necessarily limited to point clouds of word embeddings alone. We conclude by using IsoScore to demonstrate that a number of recent conclusions in the NLP literature that have been derived using brittle metrics of spatial distribution may be incomplete or altogether inaccurate.
One-sentence Summary: We propose IsoScore: a novel metric that quantifies the degree to which a point cloud uniformly utilizes the ambient vector space.
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/arxiv:2108.07344/code)
5 Replies

Loading