WorkflowAgent: Towards Specialized Web Agents Using Production-Scale Workflow Data

WorkflowAgent: Towards Specialized Web Agents Using Production-Scale Workflow Data

TMLR Paper4306 Authors

21 Feb 2025 (modified: 16 Jun 2025)Rejected by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: LLM agents are advancing in handling web-based tasks. However, most LLM web agents rely on prompting general-purpose, proprietary models like GPT-4, which are not specifically trained to process web languages (e.g., HTML) or perform long-horizon planning. We explore an alternative approach that fine-tunes open-source LLMs using production-scale workflow data collected from over 250 domains corresponding to 6 billion tokens. This approach shows substantial gains over prompting-based agents on existing benchmarks—our agent achieves state-of-the-art action generation performance on the Mind2Web benchmark and improves the task success rate by 7.3% over existing prompting-based agents on WebArena. We perform detailed ablation studies on various fine-tuning design choices and provide valuable insights into LLM selection, training recipes, context window optimization, and the effect of dataset sizes.

Submission Length: Regular submission (no more than 12 pages of main content)

Assigned Action Editor: ~Aleksandra_Faust1

Submission Number: 4306

Loading