WEBSERV: A Browser-Server Environment for Efficient Training of Reinforcement Learning-based Web Agents at Scale

Yuxuan Lu; Jing Huang; Hui Liu; Jiri Gesi; Yan Han; Shihan Fu; Tianqi Zheng; Dakuo Wang

WEBSERV: A Browser-Server Environment for Efficient Training of Reinforcement Learning-based Web Agents at Scale

Yuxuan Lu, Jing Huang, Hui Liu, Jiri Gesi, Yan Han, Shihan Fu, Tianqi Zheng, Dakuo Wang

Published: 06 Oct 2025, Last Modified: 04 Nov 2025MTI-LLM @ NeurIPS 2025 PosterEveryoneRevisionsBibTeXCC BY-ND 4.0

Keywords: Web Environment, LLM Agent, web agent, multiturn reinforcement learning

Abstract: Training and evaluation of Reinforcement Learning (RL) web agents have gained increasing attention, yet a scalable and efficient environment that couples realistic and robust browser-side interaction with controllable server-side state at scale is still missing. Existing environments tend to have one or more of the following issues: they overwhelm policy models with excessive and noisy context; they perform actions non-deterministically without waiting for the UI or network to stabilize; or they cannot scale isolated client–server containers effectively for parallel RL rollouts. We propose WebServ, an environment that includes a compact, site-agnostic browser sandbox that balances context and action complexity, alongside a scalable RL backend that efficiently launches and resets web servers to support high-throughput training and evaluation. We evaluate WebServ on the shopping, CMS and Gitlab tasks in WebArena, achieving state-of-the-art single-prompt success rates while reducing launch latency by ~5× and storage requirements by ~240×, all with a comparable memory footprint, enabling 200+ concurrent containers on a single host.

Submission Number: 183

Loading