Exploring the Small World of Word Embeddings: A Comparative Study on Conceptual Spaces from LLMs of Different Scales
Abstract: A conceptual space represents concepts as nodes and semantic relatedness as edges. Word embeddings, paired with a similarity metric, offer an efficient way to construct such a space. Typically, these embeddings come from traditional distributed models or encoder-only pretrained models, as their objectives directly capture the current token’s meaning. In contrast, decoder-only models, including large language models (LLMs), predict the next token, making their embeddings less directly tied to the current token’s semantics. This paper constructs a conceptual space using word embeddings from LLMs and explores its properties. We build a network based on a linguistic typology-inspired connectivity hypothesis, analyze global statistics, and compare LLMs of different scales. Locally, we examine conceptual pairs, WordNet relations, and a cross-lingual semantic network for qualitative words. Our results show that the space exhibits small-world properties, with a high clustering coefficient and short path lengths. Larger LLMs produce more complex spaces, characterized by longer paths and richer relational structures. Additionally, the network serves as an efficient agent for cross-lingual semantic maps.
Paper Type: Long
Research Area: Linguistic theories, Cognitive Modeling and Psycholinguistics
Research Area Keywords: conceptual spaces, large language models, word embeddings, small world
Contribution Types: Model analysis & interpretability
Languages Studied: mainly English and also involves multiple languages
Submission Number: 5264
Loading