Keywords: web agent, search agent, reinforcement learning
TL;DR: We propose the ReSum paradigm to overcome context limitations in web agents, along with a tailored ReSum-GRPO algorithm for paradigm adaptation.
Abstract: Large Language Model (LLM)-based web agents demonstrate strong performance on knowledge-intensive tasks but are hindered by context window limitations in paradigms like ReAct. Complex queries involving multiple entities, intertwined relationships, and high uncertainty demand extensive search cycles that rapidly exhaust context budgets before reaching solutions. To overcome this challenge, we introduce ReSum, a novel paradigm that enables indefinite exploration through periodic context summarization. ReSum converts growing interaction histories into compact reasoning states, maintaining awareness of prior discoveries while bypassing context constraints. For paradigm adaptation, we propose ReSum-GRPO, integrating GRPO with segmented trajectory training and advantage broadcasting to familiarize agents with summary-conditioned reasoning. Extensive experiments on web agents across three benchmarks demonstrate that ReSum delivers an average absolute improvement of 4.5% over ReAct, with further gains of 8.2% following ReSum-GRPO training. Notably, with only 1K training samples, the ReSum-GRPO-trained 30B model achieves 33.3% Pass@1 on BrowseComp-zh and 18.3% on BrowseComp-en, showing competitive performance with leading open-source web agents.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 8387
Loading