Tracing and Correcting Programs: Critic-Guided Synthesis for Visual Reasoning

Marha Midhatiey Rusli; Donghyeon Shin; Sejin Kim; Sundong Kim

Tracing and Correcting Programs: Critic-Guided Synthesis for Visual Reasoning

Marha Midhatiey Rusli, Donghyeon Shin, Sejin Kim, Sundong Kim

Published: 28 Dec 2025, Last Modified: 08 Mar 2026AAAI 2026 Bridge LMReasoning OralEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Program Synthesis, Abstraction and Reasoning Corpus, Large Language Models, Debugging, Code Repair, Critic-Guided, Adaptive Feedback and Sampling

TL;DR: Tracing and Correcting Programs (TCP) is a critic-guided framework that iteratively repairs program synthesis for visual reasoning using adaptive feedback and strategy selection, enabling efficient solutions without unguided large-scale generation

Abstract: Program synthesis for complex reasoning tasks faces a fundamental challenge: initial attempts often generate flawed programs that fail to capture the underlying problem logic. We introduce Tracing and Correcting Programs (TCP), a critic-guided framework that shifts the paradigm from generate-and-test to critic-guided repair through iterative refinement. Instead of discarding failed programs, TCP begins by analyzing each task, tracing its execution errors, and generating structured diagnostic feedback through a critic module. Through an iterative validation process, corrected programs are refined and tested until a solution emerges. Our key contributions include: (1) A systematic approach that transforms failed synthesis attempts into improved and correct programs, (2) An adaptive sampling strategy that allocates computational resources based on task complexity, requiring only 7-8 samples per task for complete solutions, and (3) A zero-shot methodology that requires no task-specific training. We evaluate TCP on the challenging Abstraction and Reasoning Corpus (ARC), which covers all 800 tasks, where TCP solves 159 tasks and improves the majority by up to 68.1%. Unlike evolutionary or multi-agent methods, which require evaluating hundreds or thousands of candidates, often with significant training overhead, TCP achieves systematic improvements with samples two orders of magnitude lower (300-400x reduction). These results highlight the importance of feedback-driven refinement and establish a new paradigm for efficient program synthesis in complex reasoning domains.

Submission Number: 108

Loading