Review, Refine, Repeat: Iterative Decoding of AI Agents with Dynamic Evaluation and Selection

Review, Refine, Repeat: Iterative Decoding of AI Agents with Dynamic Evaluation and Selection

ACL ARR 2025 February Submission4179 Authors

15 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: While AI agents have started excelling at various tasks, they may still struggle with complex structured generation and strategic planning. Improvements via standard fine-tuning is often impractical, as solving agentic tasks rely on black-box API access without control over model parameters. Inference-time methods offer a viable alternative, but existing approaches require white/gray-box access, limiting their applicability to black-box settings. A natural black-box solution is Best-of-N (BoN) sampling, a simple yet effective inference-time technique that operates without access to model weights or logits. However, BoN is inherently static and lacks iterative feedback integration, reducing its effectiveness in complex tasks. To address this, we propose IAD, an {\em iterative decoding} approach that combines iterative refinement with dynamic candidate evaluation and selection guided by a verifier, to improve upon BoN. IAD is flexible, model-agnostic, and seamlessly integrates with API-based models, making it broadly applicable to agentic tasks. We evaluate IAD on diverse agentic tasks including Sketch2Code, Text2SQL, and Webshop, where it consistently outperforms baselines by over 15\% across multiple metrics and setups.

Paper Type: Long

Research Area: NLP Applications

Research Area Keywords: Function-calling, Agent Inference, LLM, Alignment

Languages Studied: English

Submission Number: 4179

Loading