Keywords: LLM Reasoning, Cognitively Inspired Reasoning, Test-Time Guidance, Cognitive Elements
TL;DR: We propose a cognitive taxonomy to analyze LLM reasoning traces, find models narrow their cognitive repertoire precisely where success demands diversity, and show that scaffolding successful reasoning structures improves performance by up to 27%.
Abstract: Large language models (LLMs) solve complex problems yet fail on simpler variants, suggesting they achieve correct outputs through mechanisms fundamentally different from human reasoning. To understand this gap, we synthesize cognitive science research into a taxonomy of 28 cognitive elements spanning reasoning invariants, meta-cognitive controls, knowledge representations, and transformation operations. We conduct the first large-scale empirical analysis of 192K reasoning traces from 18 models across text, vision, and audio modalities, complemented by 54 human think-aloud traces. Our analysis reveals a fundamental misalignment: models narrow to rigid sequential processing on ill-structured problems precisely where diverse representations and meta-cognitive monitoring correlate most strongly with success. Human traces show more abstraction and conceptual processing, while models default to surface-level enumeration. Leveraging these behavioral patterns, we develop test-time reasoning guidance that scaffolds successful cognitive structures, improving performance by up to 26.7% on complex problems. This confirms that models possess latent reasoning capabilities but fail to deploy them spontaneously. Our framework establishes shared vocabulary between cognitive science and LLM research, enabling systematic diagnosis of reasoning failures and principled development of models that reason through robust cognitive mechanisms rather than spurious shortcuts.
Paper Type: New Full Paper
Submission Number: 72
Loading