Keywords: LLM, Agent, Web Agent, Task Decomposition
TL;DR: WebDART tackles complex web tasks by splitting navigation, extraction, and analysis, and adding dynamic re-planning, reducing navigation burden and boosting success by +13.7% on WebChoreArena.
Abstract: Large-language-model (LLM) agents are becoming competent at straightforward web tasks, such as opening an item page or submitting a form, but still struggle with objectives that require long-horizon navigation, large-scale information extraction, and reasoning under constraints. We present WebDART, a general framework that enables a single LLM to handle such complex chores. WebDART (i) dynamically decomposes each objective into three focused subtasks—navigation, information extraction, and execution—so the model concentrates on one skill at a time, and (ii) continuously re-plans the decomposition as new webpages are revealed, taking advantage of newly discovered filters or shortcuts and avoiding redundant exploration. Evaluated on WebChoreArena, WebDART lifts end-to-end success rates by up to 13.7 percentage points over previous state-of-the-art agents, while matching their performance on the easier WebArena suite and completing tasks with up to 14.7 fewer navigation steps. Code will be publicly available.
Supplementary Material: zip
Primary Area: foundation or frontier models, including LLMs
Submission Number: 22407
Loading