WEBSERV: A Browser-Server Environment for Efficient Training of Reinforcement Learning-based Web Agents at Scale

Published: 06 Oct 2025, Last Modified: 04 Nov 2025MTI-LLM @ NeurIPS 2025 PosterEveryoneRevisionsBibTeXCC BY-ND 4.0
Keywords: Web Environment, LLM Agent, web agent, multiturn reinforcement learning
Abstract: Training and evaluation of Reinforcement Learning (RL) web agents have gained increasing attention, yet a scalable and efficient environment that couples realistic and robust browser-side interaction with controllable server-side state at scale is still missing. Existing environments tend to have one or more of the following issues: they overwhelm policy models with excessive and noisy context; they perform actions non-deterministically without waiting for the UI or network to stabilize; or they cannot scale isolated client–server containers effectively for parallel RL rollouts. We propose WebServ, an environment that includes a compact, site-agnostic browser sandbox that balances context and action complexity, alongside a scalable RL backend that efficiently launches and resets web servers to support high-throughput training and evaluation. We evaluate WebServ on the shopping, CMS and Gitlab tasks in WebArena, achieving state-of-the-art single-prompt success rates while reducing launch latency by ~5× and storage requirements by ~240×, all with a comparable memory footprint, enabling 200+ concurrent containers on a single host.
Submission Number: 183
Loading