The Source of Competence Shapes Metacognition in Language Models

Published: 04 Jun 2026, Last Modified: 04 Jun 2026ICML MemFM 2026 Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: memorization, metacognitive, competence
Abstract: Large language models are increasingly expected not only to produce correct answers, but also to reliably estimate when they are likely to be wrong. Existing work largely assumes that metacognitive reliability degrades proportionally with capability: weaker models are expected to be correspondingly less confident or less calibrated. In this work, we challenge this assumption and show that metacognition depends strongly on the source and structure of competence rather than on raw performance alone. We systematically study confidence behavior across multiple capability regimes, including scale reduction, partial memory degradation, reasoning truncation, quantization, and evidence-grounded inference. Across these settings, we observe distinct metacognitive regimes and identify narrow overconfident failure bands in which models retain high confidence despite substantial capability loss. Surprisingly, models with comparable task accuracy often exhibit dramatically different calibration and overconfidence profiles depending on how their competence is obtained. In particular, partial or stale parametric memory induces substantially stronger overconfidence than either complete ignorance or evidence-grounded reasoning. These findings suggest that metacognitive reliability is partially separable from capability and is closely tied to the accessibility and stability of internal knowledge representations. Our results provide a new perspective on hallucination, calibration, and memory reliability in foundation models.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 73
Loading