Keywords: Metacognition; Large Reasoning Models
Abstract: Large Reasoning Models (LRMs) have achieved strong performance on complex multi-step tasks. However, they often fail to assess difficulty, monitor uncertainty, or revise incorrect reasoning—posing fundamental challenges to reliability. We argue that these limitations reflect the absence of metacognition, the capacity to monitor and control one’s own cognitive processes. Building on insights from cognitive science, we present a structured study of metacognition in LRMs, focusing on both internal signals and observable behaviors. We first show that internal activations, attention patterns, and token-level confidences contain rich information predictive of reasoning correctness. We then design a set of evaluation tasks to test functional abilities such as difficulty awareness, confidence adjustment, task decomposition, and strategy revision across five widely used LRMs. Our results suggest that while LRMs exhibit partial signs of metacognitive behavior, these abilities are inconsistent and easily disrupted. We further explore two complementary approaches for strengthening metacognition: prompt-driven control and supervised training with structured metacognitive traces. Together, our findings highlight metacognition as a critical lens for diagnosing and improving reasoning in large models.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 18006
Loading