QLASS: Boosting Language Agent Inference via Q-Guided Stepwise Search

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Language agents have become a promising solution to complex interactive tasks. One of the key ingredients to the success of language agents is the reward model on the trajectory of the agentic workflow, which provides valuable guidance during training or inference. However, due to the lack of annotations of intermediate interactions, most existing works use an outcome reward model to optimize policies across entire trajectories. This may lead to sub-optimal policies and hinder the overall performance. To address this, we propose QLASS (Q-guided Language Agent Stepwise Search), to automatically generate annotations by estimating Q-values in a stepwise manner for open language agents. By introducing a reasoning tree and performing process reward modeling, QLASS provides effective intermediate guidance for each step. With the stepwise guidance, we propose a Q-guided generation strategy to enable language agents to better adapt to long-term value, resulting in significant performance improvement during model inference on complex interactive agent tasks. Notably, even with almost half the annotated data, QLASS retains strong performance, demonstrating its efficiency in handling limited supervision. We also empirically demonstrate that QLASS can lead to more effective decision making through qualitative analysis.
Lay Summary: Imagine teaching a language agent not just by praising its final answer but by cheering it on at every step of a complex task. That’s what QLASS does: it automatically assigns helpful “hint scores” to each individual action, using a reasoning tree to estimate the long-term value of each choice. By offering real-time guidance on every move, QLASS helps the assistant spot promising directions, course-correct before issues escalate, enabling more focused effort. This leads to faster learning, smarter decision-making, and more reliable performance on challenging, multi-stage tasks. Even with half as many annotated examples, agents trained with QLASS maintain strong results, showing how efficiently it works with limited feedback. They also found that stepwise guidance reduces wasted steps and improves overall task success rates. QLASS gives language-powered systems a coach that highlights good moves, warns of dead ends, and keeps them on track—making AI assistants more effective, adaptable, and consistent.
Application-Driven Machine Learning: This submission is on Application-Driven Machine Learning.
Primary Area: Deep Learning->Large Language Models
Keywords: Agent, process reward model
Submission Number: 7542
Loading