Analogical Reasoning Inside Large Language Models : Concept Vectors and the Limits of Abstraction

ACL ARR 2025 February Submission7887 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Analogical reasoning relies on conceptual abstractions, but it is unclear whether LLMs harbor such internal representations. We explore distilled representations from LLM activations and find that function vectors (FVs; Todd et al., 2024)—compact representations for in-context learning (ICL) tasks—are not invariant to simple input changes (e.g., open-ended vs. multiple-choice), suggesting they capture more than pure concepts. Using representational similarity analysis (RSA), we localize a small set of attention heads that encode invariant concept vectors (CVs) for verbal concepts like "antonym". These CVs function as feature detectors that operate independently of the final output—meaning that a model may form a correct internal representation yet still produce an incorrect output. Furthermore, CVs can be used to causally guide model behaviour. However, for more abstract concepts like "previous" and "next", we do not observe invariant linear representations, a finding we link to generalizability issues LLMs display within these domains.
Paper Type: Long
Research Area: Interpretability and Analysis of Models for NLP
Research Area Keywords: Interpretability and Analysis of Models for NLP, Semantics: Lexical and Sentence-Level
Contribution Types: Model analysis & interpretability
Languages Studied: English, French, German, Spanish
Submission Number: 7887
Loading