Web Agents Are Still Greedy: Progress-Aware Action Generation and Selection via Meta-Plan

Joonhyun Jeong; Gilhyun Nam; Yeonsung Jung; Eunho Yang

Web Agents Are Still Greedy: Progress-Aware Action Generation and Selection via Meta-Plan

Joonhyun Jeong, Gilhyun Nam, Yeonsung Jung, Eunho Yang

18 Sept 2025 (modified: 14 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Models, Multimodal Large Language Models, Web Agent

TL;DR: We introduce a new action generation and selection strategy for web agents, effectively resolving common failure modes of state-of-the-art web agent frameworks.

Abstract: Despite recent advancements in the reasoning and planning capabilities of large language models that enable automated web agent tasks, state-of-the-art web agent frameworks still exhibit high task failure rates and lag behind human performance, hindering their deployment and generalization across diverse website environments. In this paper, we identify key limitations in these web agent frameworks, attributing failure to their greedy reasoning without an understanding of the current task progress regarding what key steps have been completed and what remains in the next steps. Due to this lack of progress awareness, existing web agent frameworks often fall into suboptimal behaviors such as skipping essential key steps and producing incoherent or oscillatory trajectories, which hinder task completion. To address these limitations, we propose MAPLE, a simple yet effective add-on method with MetA-PLan guided action generation and sElection. Our proposed method equips existing web agent frameworks to self-reason with an explicit meta-plan that encompasses high-level sequential guidelines for solving the task, enabling them to keep track of current progress and consistently adhere to the given guidelines. Experiments across diverse website benchmarks demonstrate that MAPLE largely outperforms previous state-of-the-art web agent frameworks by addressing their common failures and suboptimal behaviors induced by the lack of progress awareness.

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 11010

Loading