How Chain-of-Thought Works? Tracing Information Flow from Decoding, Projection, and Activation

How Chain-of-Thought Works? Tracing Information Flow from Decoding, Projection, and Activation

ACL ARR 2025 May Submission2408 Authors

19 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Chain-of-Thought (CoT) prompting significantly enhances model reasoning, yet its internal mechanisms remain poorly understood. We analyze CoT’s operational principles by reversely tracing information flow across decoding, projection, and activation phases. Our quantitative analysis suggests that CoT may serve as a decoding space pruner, leveraging answer templates to guide output generation, with higher template adherence strongly correlating with improved performance. Furthermore, we surprisingly find that CoT modulates neuron engagement in a task-dependent manner: reducing neuron activation in open-domain tasks, yet increasing it in closed-domain scenarios. These findings offer a novel mechanistic interpretability framework and critical insights for the NLP community, enabling targeted CoT interventions to design more efficient and robust prompts.

Paper Type: Long

Research Area: Interpretability and Analysis of Models for NLP

Research Area Keywords: knowledge tracing/discovering/inducing; free-text/natural language explanations

Contribution Types: Model analysis & interpretability

Languages Studied: english

Keywords: Large Language Models; Chain-of-Thought; Decoding; Projection; Activation

Submission Number: 2408

Loading