Uncertainty-Aware Self-Correction for Coding Agents
Keywords: Uncertainty quantification, LLM coding agents
TL;DR: Uncertainty-Aware Self-Correction for Coding Agents
Abstract: Recent advances in large language models (LLMs) have enabled agentic systems that perform complex, multi-step tasks in realistic environments, particularly in software engineering settings where agents must navigate code bases, plan actions, execute code, and iteratively adapt based on environmental feedback.
Despite their capabilities, agent reliability remains a critical challenge: errors made early in an agent's trajectory can propagate, leading to incorrect patches, wasted computation, or misleading confidence.
Most existing uncertainty estimation methods focus on single-turn outputs and do not account for uncertainty accumulation across multi-step reasoning.
In this work, we adapt and extend Situational Awareness Uncertainty Propagation (SAUP) to coding agents operating on SWE-Rebench. We propose a simplified variant of SAUP that replaces learned semantic distance metrics with API-derived signals and heuristic action weights, making it practical for black-box API settings. We demonstrate how step level uncertainty estimation can be propagated across an agent's trajectory and used to trigger self-correction when confidence is low.
By intervening selectively at high-uncertainty steps, our approach improves final task success while avoiding unnecessary computation.
Across three frontier models (GPT-5, Claude Opus 4.5, and DeepSeek V3.2), uncertainty-aware resampling reduces mean trajectory uncertainty by 6-20\% relative and improves pass@1 by up to 15.6 absolute percentage points, with latency overhead of 1.2-3.0$\times$ depending on the model.
PDF: pdf
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 221
Loading