From Directions to Cones: Multidimensional Representations of Propositional Facts in LLMs

Stanley Yu; Vaidehi Bulusu; Clayton Lau; Oscar S. Yasunaga; Cole Blondin; Vasu Sharma; Kevin Zhu; Sean O'Brien

From Directions to Cones: Multidimensional Representations of Propositional Facts in LLMs

Stanley Yu, Vaidehi Bulusu, Clayton Lau, Oscar S. Yasunaga, Cole Blondin, Vasu Sharma, Kevin Zhu, Sean O'Brien

Published: 22 Jun 2025, Last Modified: 17 Jul 2025ACL-SRW 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: mechanistic interpretability, concept cones, representation of facts, large language models

TL;DR: Explored concept cone representation for propositional fact in LLMs.

Abstract: Large Language Models (LLMs) exhibit strong conversational abilities but often generate falsehoods. Prior work suggests that the truthfulness of simple propositions can be represented as a single linear direction in a model's internal activations, but this may not fully capture its underlying geometry. In this work, we extend the concept cone framework, recently introduced for modeling refusal, to the domain of truth. We identify multidimensional cones whose directions reliably steer model behavior in response to simple factual statements. Our results are supported by three lines of evidence: (i) causal interventions reliably flip model responses to factual statements; (ii) learned cones exhibit generalization across small model architectures; and (iii) cone-based interventions preserve unrelated model behavior. These findings reveal the richer, multidirectional structure governing simple true/false propositions in LLMs and highlight concept cones as a promising tool for probing abstract behaviors.

Archival Status: Non-archival

Acl Copyright Transfer: pdf

Paper Length: Long Paper (up to 8 pages of content)

Submission Number: 282

Loading