Keywords: Web Workflow Automation, LLM Agents, Orchestration, Prompt Adaptation, Lookahead Agent, Observation Space Reduction
TL;DR: A framework for web workflow automation comprising of an orchestrator agent that dynamically invokes diverse LLM agents to deliberate the suitability of actions.
Abstract: Performing tasks automatically over the web using LLM-based agents has seen an emergent need and interest. Executing a web task based on the intent expressed by a user requires carrying out a sequence of steps which presents several challenges owing to complex nature of web workflows and variation across web interfaces. Several past works which have proposed agentic framework for web workflow execution either employ a fixed static call sequence while invoking LLM agents or stack calls to code-based functions during runtime. Further, limited attention has been given to designing adaptable LLM-based web agents with dynamically tunable prompts. To this end, we propose AutoWeave , an agentic framework comprising of a suite of LLM-based agents to anticipate future possibilities due to an action by looking-ahead and simulate the suitability of actions during each step of workflow execution. The deliberation between the agents is facilitated by an orchestrator LLM agent which dynamically invokes the next appropriate agent based on interaction between the agents and the workflow executed so far. In addition, the orchestrator agent refines the prompt for each agent based on the task context before calling it during deliberation. We establish the efficacy of AutoWeave on a variety of benchmarks comprising 1) real-world websites like WebVoyager and 2) simulated web environments like WebArena with relative gains of 10% and 22% respectively over the best baselines. We show that AutoWeave consistently improves the performance of LLM-based web agents for multiple model families like Llama-3 and Qwen-2.5. Further, we conduct extensive ablations to verify the effectiveness of each agent in AutoWeave and the importance of Orchestrator for dynamic invocation of agents and prompt adaptation.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 10637
Loading