Intricacies of Feature Geometry in Large Language Models

Satvik Golechha; Lucius Bushnaq; Euan Ong; Neeraj Kayal; Nandi Schoots

Intricacies of Feature Geometry in Large Language Models

Satvik Golechha, Lucius Bushnaq, Euan Ong, Neeraj Kayal, Nandi Schoots

Published: 23 Jan 2025, Last Modified: 23 Mar 2025ICLR 2025 Blogpost TrackEveryoneRevisionsBibTeXCC BY 4.0

Blogpost Url: https://d2jud02ci9yv69.cloudfront.net/2025-04-28-feature-geometry-65/blog/feature-geometry/

Abstract: Studying the geometry of a language model's embedding space is an important and challenging task because of the various ways concepts can be represented, extracted, and used. Specifically, we want a framework that unifies both measurement (of how well a latent explains a feature/concept) and causal intervention (how well it can be used to control/steer the model). We discuss several challenges with using some recent approaches to study the geometry of categorical and hierarchical concepts in large language models (LLMs) and both theoretically and empirically justify our main takeaway, which is that their orthogonality and polytopes results are trivially true in high-dimensional spaces, and can be observed even in settings where they should not occur.

Conflict Of Interest: We have no conflict of interest.

Submission Number: 90

Loading