Abstract: Large Language Models (LLMs) often generate code with subtle but
critical bugs, especially for complex tasks. Existing automated re-
pair methods typically rely on superficial pass/fail signals, offering
limited visibility into program behavior and hindering precise error
localization. In addition, without a way to learn from prior failures,
repair processes often fall into repetitive and inefficient cycles. To
overcome these challenges, we present TraceCoder, a collaborative
multi-agent framework that emulates the observe-analyze-repair
process of human experts. The framework first instruments the
code with diagnostic probes to capture fine-grained runtime traces,
enabling deep insight into its internal execution. It then conducts
causal analysis on these traces to accurately identify the root cause
of the failure. This process is further enhanced by a novel Historical
Lesson Learning Mechanism (HLLM), which distills insights from
prior failed repair attempts to inform subsequent correction strate-
gies and prevent recurrence of similar mistakes. To ensure stable
convergence, a Rollback Mechanism enforces that each repair iter-
ation constitutes a strict improvement toward the correct solution.
Comprehensive experiments across multiple benchmarks show that
TraceCoder achieves up to a 34.43% relative improvement in Pass@1
accuracy over existing advanced baselines. Ablation studies verify
the significance of each system component, with the iterative repair
process alone contributing a 65.61% relative gain in accuracy. Fur-
thermore, TraceCoder significantly outperforms leading iterative
methods in terms of both accuracy and cost-efficiency.
Loading