ARGO: Asynchronous Rollout with Human Guidance for Research Agent Optimization

20 Sept 2025 (modified: 04 Jan 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: code agents, research agent, sft, human-agent interaction
TL;DR: A agent trajectory rollout strategy to dealing with real hard and time consuming tasks
Abstract: Large Language Model (LLM) agents have recently shown strong potential in domains such as automated coding, deep research, and graphical user interface manipulation. However, training them to succeed on **long-horizon, domain-specialized** tasks remains challenging. Current approaches either rely on dense human annotations through behavior cloning, which is prohibitively expensive for tasks that cost days/months, or on outcome-driven sampling, which often collapses due to the rarity of valid positive trajectories on long-horizon, domain-specialized tasks. We introduce ARGO, a sampling framework that integrates **asynchronous human guidance with action-level data filtering**. Instead of requiring annotators to shadow every step, ARGO allows them to intervene only when the agent drifts from a promising trajectory, for example, by providing prior knowledge, or strategic advice. This lightweight, high-level oversight produces valuable trajectories at lower cost. ARGO then applies supervision control to filter out sub-optimal action, stabilizing optimization, and preventing error propagation. Together, these components enable reliable and effective data collection in long-horizon environments. To demonstrate the effectiveness of ARGO, we evaluate it using InnovatorBench. Our experiments show that when applied to train the GLM-4.5 model on InnovatorBench, ARGO achieves more than a 50\% improvement over the untrained baseline and a 28\% improvement over a variant trained without human interaction. These results highlight the critical role of human-in-the-loop sampling and the robustness of ARGO’s design in handling long-horizon, domain-specialized tasks.
Primary Area: generative models
Submission Number: 23053
Loading