WorkflowAgent: Towards Specialized Web Agents Using Production-Scale Workflow Data

WorkflowAgent: Towards Specialized Web Agents Using Production-Scale Workflow Data

ACL ARR 2024 December Submission333 Authors

13 Dec 2024 (modified: 05 Feb 2025)ACL ARR 2024 December SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: LLM agents are advancing in handling web-based tasks. However, most LLM web agents rely on prompting general-purpose, proprietary models like GPT-4, which are not specifically trained to process web languages (e.g., HTML) or perform long-horizon planning. We explore an alternative paradigm of developing specialized web agents, namely supervised fine-tuning of open-source LLMs using production-scale workflow data. This strategy not only reduces serving costs but also substantially improves the empirical results—our agent achieves state-of-the-art action generation performance on the Mind2Web benchmark and improves the task success rate by 7.3% over existing prompting-based agents on WebArena. We further perform detailed ablation studies on various design choices and provide insights into LLM selection, training recipes, context window optimization, and the effect of dataset sizes.

Paper Type: Long

Research Area: NLP Applications

Research Area Keywords: LLM web agent, web navigation, fine-tuning

Contribution Types: NLP engineering experiment, Publicly available software and/or pre-trained models

Languages Studied: English

Submission Number: 333

Loading