Layer-Wise Cognitive Specialization in Large Language Models: A Cross-Architecture Analysis of Concept Emergence

TMLR Paper7919 Authors

13 Mar 2026 (modified: 25 May 2026)Rejected by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: This paper studies how internal representations change layer by layer in four language models: DeepSeek-R1-Distill-Qwen-1.5B, Qwen3-4B-Thinking, Llama-3.1-8B-Instruct, and Mistral-7B-Instruct-v0.2. We use 128 linear probes and activations from 215 questions across 16 cognitive categories to track when each category becomes easy to decode from model states. We find three main results. First, the same broad ordering appears across models: spatial navigation and logical reasoning become separable early, while pattern recognition and executive function appear later. Second, most gains happen in the first third of layers in all models, with clear differences in later layers. For example, Mistral-7B loses separability in late layers (−1.4%), while Llama-8B shows the largest confidence increase (0.41-bit en tropy reduction). Third, fresh paraphrase-based replication shows that late-layer category decoding transfers across models (mean best accuracy 0.641 on 62 paraphrased prompts), but exact emergence ordering does not replicate cleanly (mean rank correlation 0.016). We validate results with bootstrap confidence intervals, confusion analysis, robust metrics, sig nificance tests, paraphrase replication, intervention tests, and sanity controls. These findings offer a practical map of where cognitive information appears and changes inside language models while also clarifying which parts of that map are robust to prompt reformulation.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Xuanjing_Huang1
Submission Number: 7919
Loading