Do Large Language Models Know What They Are Capable Of?

ICLR 2026 Conference Submission15807 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: LLM Calibration, Decision Making, Overconfidence, In-context learning, LLM Agents, LLM self-knowledge, AI Safety
TL;DR: We find that LLMs are overconfident in predicting their success on tasks, but some learn from in-context experience to make more risk-averse decisions about which tasks to attempt.
Abstract: We investigate whether large language models (LLMs) can predict whether they will succeed on a given task, and whether their predictions improve as they progress through multi-step tasks. We also investigate whether LLMs can learn from in-context experiences to make better decisions about whether to pursue a task in scenarios where failure is costly. All LLMs we tested are overconfident, but most predict their success with better-than-random discriminatory power. We find that newer and larger LLMs generally do not have greater discriminatory power. On multi-step agentic tasks, the overconfidence of several frontier LLMs worsens as they progress through the tasks, and reasoning LLMs perform comparable to or worse than non-reasoning LLMs. With in-context experiences of failure, most LLMs only slightly reduce their overconfidence, though in a resource acquisition scenario several LLMs (Claude Sonnet models and GPT-4.5) improve their performance by increasing their risk aversion. These results suggest that current LLM agents are hindered by their lack of awareness of their own capabilities. We discuss the implications of LLMs' awareness of their capabilities for AI misuse and misalignment risks.
Supplementary Material: zip
Primary Area: foundation or frontier models, including LLMs
Submission Number: 15807
Loading