Keywords: Code Generation, Divide-and-conquer
TL;DR: We propose Planning-after-Trial (PaT), a policy that attempts a direct solution and invokes an expensive planner only upon failure, significantly improving the cost-performance trade-off for code generation.
Abstract: Large language models (LLMs) have demonstrated increasingly sophisticated capabilities for code generation. To extend the problem-solving reach of cost-efficient models to complex problems, strategic planning via problem decomposition has emerged as a key paradigm. However, most existing pipelines adopt a rigid Planning-before-Trial (PbT) policy, which inefficiently allocates test-time compute by incurring planning overhead even on directly solvable problems. We propose an adaptive Planning-after-Trial (PaT) policy that uses the outcome of a direct attempt as a feedback signal, invoking a planner only upon verification failure. This adaptive policy naturally enables a heterogeneous model configuration: a cost-efficient model handles generation attempts, while a powerful model is reserved for targeted planning interventions. Empirically, across multiple benchmarks and model families, our approach significantly advances the cost-accuracy Pareto frontier by judiciously avoiding indiscriminate planning on simple problems and concentrating test-time compute precisely where it is needed most.
Supplementary Material: zip
Primary Area: applications to robotics, autonomy, planning
Submission Number: 17741
Loading