TL;DR: We propose Agent Workflow Memory that enable digital agents to induce and reuse increasingly complex workflows on the fly.
Abstract: Despite the potential of language model-based agents to solve real-world tasks such as web navigation, current methods still struggle with long-horizon tasks with complex action trajectories. In contrast, humans can flexibly solve complex tasks by learning reusable task workflows from past experiences and using them to guide future actions. To build agents that can similarly benefit from this process, we introduce Agent Workflow Memory (AWM), a method for inducing commonly reused routines, i.e., workflows, and selectively providing workflows to the agent to guide subsequent generations. AWM flexibly applies to both offline and online scenarios, where agents induce workflows from training examples beforehand or from test queries on the fly. We experiment on two major web navigation benchmarks — Mind2Web and WebArena — that collectively cover 1000+ tasks from 200+ domains across travel, shopping, and social media, among others. AWM substantially improves the baseline results by 24.6% and 51.1% relative success rate on Mind2Web and WebArena while reducing the number of steps taken to solve WebArena tasks successfully. Furthermore, online AWM robustly generalizes in cross-task, website, and domain evaluations, surpassing baselines from 8.9 to 14.0 absolute points as train-test task distribution gaps widen.
Lay Summary: Language models are increasingly used as digital assistants to help users complete tasks on the web — like booking flights or shopping online. However, these AI agents often struggle when tasks are long or complex. Humans, by contrast, tend to learn from experience: once we figure out how to do something, we remember the steps and reuse them when a similar situation comes up.
Our research introduces a new technique called Agent Workflow Memory (AWM) that helps AI agents do the same. AWM allows agents to learn useful "task recipes" — or workflows — from previous examples and reuse them when solving new problems.
We tested AWM on two large collections of real-world web tasks covering over 1,000 examples from sites like shopping and social platforms. AWM significantly improved success rates and made agents more efficient. Even when the agent encountered new tasks or websites it hadn't seen before, AWM helped it generalize and perform better than existing methods.
Link To Code: https://github.com/zorazrw/agent-workflow-memory
Primary Area: Applications
Keywords: agent, memory, web navigation
Submission Number: 4702
Loading