Uncertainty-Aware Self-Correction for Coding Agents

Jason Almeida; Lokesh Sai Dasari; Anubhav Pal; Tinuade Adeleke; Sean Wu; Ruizhe Li

Uncertainty-Aware Self-Correction for Coding Agents

Jason Almeida, Lokesh Sai Dasari, Anubhav Pal, Tinuade Adeleke, Sean Wu, Ruizhe Li

Published: 01 Mar 2026, Last Modified: 24 Apr 2026ICLR 2026 AIWILDEveryoneRevisionsCC BY 4.0

Keywords: Uncertainty quantification, LLM coding agents

TL;DR: Uncertainty-Aware Self-Correction for Coding Agents

Abstract: Recent advances in large language models (LLMs) have enabled agentic systems that perform complex, multi-step tasks in realistic environments, particularly in software engineering settings where agents must navigate code bases, plan actions, execute code, and iteratively adapt based on environmental feedback. Despite their capabilities, agent reliability remains a critical challenge: errors made early in an agent's trajectory can propagate, leading to incorrect patches, wasted computation, or misleading confidence. Most existing uncertainty estimation methods focus on single-turn outputs and do not account for uncertainty accumulation across multi-step reasoning. In this work, we adapt and extend Situational Awareness Uncertainty Propagation (SAUP) to coding agents operating on SWE-Rebench. We propose a simplified variant of SAUP that replaces learned semantic distance metrics with API-derived signals and heuristic action weights, making it practical for black-box API settings. We demonstrate how step level uncertainty estimation can be propagated across an agent's trajectory and used to trigger self-correction when confidence is low. By intervening selectively at high-uncertainty steps, our approach improves final task success while avoiding unnecessary computation. Across three frontier models (GPT-5, Claude Opus 4.5, and DeepSeek V3.2), uncertainty-aware resampling reduces mean trajectory uncertainty by 6-20\% relative and improves pass@1 by up to 15.6 absolute percentage points, with latency overhead of 1.2-3.0$\times$ depending on the model.

PDF: pdf

Email Sharing: We authorize the sharing of all author emails with Program Chairs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

Submission Number: 221

Loading