States Hidden in Hidden States: LLMs Emerge Discrete State Representations Implicitly

States Hidden in Hidden States: LLMs Emerge Discrete State Representations Implicitly

ACL ARR 2024 June Submission4576 Authors

16 Jun 2024 (modified: 02 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Large Language Models (LLMs) exhibit various emergent abilities. Among these abilities, some might reveal the internal working mechanisms of models. In this paper, we uncover a novel emergent capability in models: the intrinsic ability to perform extended sequences of calculations without relying on chain-of-thought step-by-step solutions. Remarkably, the most advanced models are capable of directly outputting the results of two-digit number additions with lengths extending up to 15 addends. We hypothesize that the model emerges discrete representations of symbols within its hidden states and performs symbolic calculations internally. To test this hypothesis, we design a sequence of experiments that look into the hidden states. Specifically, we first confirm that Implicit Discrete State Representations (IDSRs) exist. Then, we provide interesting observations about the formation of IDSRs from layer, digit, and sequence perspectives. Finally, we confirm that models indeed use IDSRs to produce the final answers. However, we also discover that the state representations are far from lossless in current open-sourced models, leading to inaccuracies in final performance. Our work presents a novel exploration of LLMs' symbolic calculation abilities and the underlying mechanisms.

Paper Type: Long

Research Area: Interpretability and Analysis of Models for NLP

Research Area Keywords: Interpretability and Analysis of Models for NLP

Contribution Types: Model analysis & interpretability

Languages Studied: English

Submission Number: 4576

Loading