Towards Internet-Scale Training For Agents

Brandon Trabucco; Gunnar A Sigurdsson; Robinson Piramuthu; Ruslan Salakhutdinov

Towards Internet-Scale Training For Agents

Brandon Trabucco, Gunnar A Sigurdsson, Robinson Piramuthu, Ruslan Salakhutdinov

Published: 04 Mar 2025, Last Modified: 17 Apr 2025ICLR 2025 Workshop SynthDataEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Deep Learning, Agents

Abstract: The predominant approach for training web navigation agents gathers human demonstrations for a set of popular websites and hand-written tasks, but it is becoming clear that human data is an inefficient resource. We develop a pipeline to facilitate large-scale training for agents without laborious human annotations. In the first stage, an LLM generates tasks for 150k diverse websites. In the next stage, LLM agents complete tasks and produce trajectories. In the final stage, an LLM reviews the trajectories, and judges their success. Language models are competitive with human annotators, detecting and filtering out harmful content with an accuracy of 97\%, generating tasks with a feasibility rate of 89\%, and judging successful trajectories at 82.6\% accuracy. Scaling the pipeline, agents based on \textit{Llama 3.1 70B} solve 16.7\% of tasks for 150k sites. Training on data generated by our pipeline is competitive with training on human demonstrations. In data-limited experiments derived from Mind2Web and WebLINX, we improve \textit{Step Accuracy} by +89.5\% and +94.5\% respectively for agents trained on mixtures of human data and data from our pipeline. Our code is available at: \href{https://data-for-agents.github.io}{data-for-agents.github.io}.

Submission Number: 78

Loading