Layer-Wise Cognitive Specialization in Large Language Models: A Cross-Architecture Analysis of Concept Emergence
Abstract: This paper studies how internal representations change layer by layer in four language
models: DeepSeek-R1-Distill-Qwen-1.5B, Qwen3-4B-Thinking, Llama-3.1-8B-Instruct, and
Mistral-7B-Instruct-v0.2. We use 128 linear probes and activations from 215 questions across
16 cognitive categories to track when each category becomes easy to decode from model
states. We find three main results. First, the same broad ordering appears across models:
spatial navigation and logical reasoning become separable early, while pattern recognition
and executive function appear later. Second, most gains happen in the first third of layers
in all models, with clear differences in later layers. For example, Mistral-7B loses separability
in late layers (−1.4%), while Llama-8B shows the largest confidence increase (0.41-bit en
tropy reduction). Third, fresh paraphrase-based replication shows that late-layer category
decoding transfers across models (mean best accuracy 0.641 on 62 paraphrased prompts),
but exact emergence ordering does not replicate cleanly (mean rank correlation 0.016). We
validate results with bootstrap confidence intervals, confusion analysis, robust metrics, sig
nificance tests, paraphrase replication, intervention tests, and sanity controls. These findings
offer a practical map of where cognitive information appears and changes inside language
models while also clarifying which parts of that map are robust to prompt reformulation.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Xuanjing_Huang1
Submission Number: 7919
Loading