WorkflowAgent: Towards Specialized Web Agents Using Production-Scale Workflow Data

ACL ARR 2024 December Submission333 Authors

13 Dec 2024 (modified: 05 Feb 2025)ACL ARR 2024 December SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: LLM agents are advancing in handling web-based tasks. However, most LLM web agents rely on prompting general-purpose, proprietary models like GPT-4, which are not specifically trained to process web languages (e.g., HTML) or perform long-horizon planning. We explore an alternative paradigm of developing specialized web agents, namely supervised fine-tuning of open-source LLMs using production-scale workflow data. This strategy not only reduces serving costs but also substantially improves the empirical results—our agent achieves state-of-the-art action generation performance on the Mind2Web benchmark and improves the task success rate by 7.3% over existing prompting-based agents on WebArena. We further perform detailed ablation studies on various design choices and provide insights into LLM selection, training recipes, context window optimization, and the effect of dataset sizes.
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: LLM web agent, web navigation, fine-tuning
Contribution Types: NLP engineering experiment, Publicly available software and/or pre-trained models
Languages Studied: English
Submission Number: 333
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview