Towards Understanding Metacognition in Large Reasoning Models

Towards Understanding Metacognition in Large Reasoning Models

ICLR 2026 Conference Submission18006 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Metacognition; Large Reasoning Models

Abstract: Large Reasoning Models (LRMs) have achieved strong performance on complex multi-step tasks. However, they often fail to assess difficulty, monitor uncertainty, or revise incorrect reasoning—posing fundamental challenges to reliability. We argue that these limitations reflect the absence of metacognition, the capacity to monitor and control one’s own cognitive processes. Building on insights from cognitive science, we present a structured study of metacognition in LRMs, focusing on both internal signals and observable behaviors. We first show that internal activations, attention patterns, and token-level confidences contain rich information predictive of reasoning correctness. We then design a set of evaluation tasks to test functional abilities such as difficulty awareness, confidence adjustment, task decomposition, and strategy revision across five widely used LRMs. Our results suggest that while LRMs exhibit partial signs of metacognitive behavior, these abilities are inconsistent and easily disrupted. We further explore two complementary approaches for strengthening metacognition: prompt-driven control and supervised training with structured metacognitive traces. Together, our findings highlight metacognition as a critical lens for diagnosing and improving reasoning in large models.

Primary Area: foundation or frontier models, including LLMs

Submission Number: 18006

Loading