Abstract: Inference-time alignment through scaling test-time compute is a promising approach for improving the performance of AI agents. Such approaches typically involve three key components: {sampling}, {evaluation}, and {feedback}. While the role of sampling and evaluation are well studied in the literature, the role of feedback on inference-time alignment is relatively under-explored. We address this gap by introducing Iterative Agent Decoding (IAD), a general sequential framework that enables the integration of different forms of feedback to improve the performance. We analyze how feedback impacts agent performance across four dimensions: (1) accuracy vs compute - budget controlled scaling (2) impact of adaptive feedback beyond sampling diversity (3) impact of feedback modalities, (4) sensitivity to feedback quality. Our evaluations on Sketch2Code, Text2SQL, Intercode and Webshop demonstrate that feedback plays a crucial role in inference-time alignment, yielding performance gains of up to 10\% over strong baselines. Our findings provide a unified understanding of the role of feedback mechanisms in inference-time alignment.
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: feedback, blacbox agent, inference alignment,
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Data analysis
Languages Studied: English
Submission Number: 5791
Loading