Aligned at the Start: Conceptual Groupings in LLM Embeddings

Aligned at the Start: Conceptual Groupings in LLM Embeddings

ACL ARR 2025 February Submission5463 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: This paper shifts focus to the often-overlooked input embeddings – the initial representations fed into transformer blocks. Using fuzzy graph, k-nearest neighbor (k-NN), and community detection, we analyze embeddings from diverse LLMs, finding significant categorical community structure aligned with predefined concepts and categories aligned with humans. We observe these groupings exhibit within-cluster organization (such as hierarchies, topological ordering, etc.), hypothesizing a fundamental structure that precedes contextual processing. To further investigate the conceptual nature of these groupings, we explore cross-model alignments across different LLM categories within their input embeddings, observing a medium to high degree of alignment. Furthermore, provide evidence that manipulating these groupings can play a functional role in mitigating ethnicity bias in LLM tasks.

Paper Type: Long

Research Area: Interpretability and Analysis of Models for NLP

Research Area Keywords: Large Language Models (LLMs), Embedding Representations, Concept Formation, Human-LM Alignment

Contribution Types: Model analysis & interpretability, NLP engineering experiment

Languages Studied: English

Submission Number: 5463

Loading