Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks

Published: 28 Sept 2025, Last Modified: 11 Oct 2025SEA @ NeurIPS 2025 OralEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Mobile Agent, GUI Agent, Large Multimodal Model
TL;DR: Introducing Mobile-Agent-E, a novel hierarchical agentic framework, capable of self-evolution, setting new SOTA on complex real-world mobile tasks.
Abstract: Recent advancements in large multimodal model (LMM)-based mobile agents have demonstrated promising capabilities in acting within mobile environments. However, current approaches face significant limitations: (1) they fall short in addressing real-world human needs, which involve complex, open-ended, and reasoning-intensive tasks; and (2) they lack mechanisms to learn and improve from prior experiences. To address these challenges, we introduce Mobile-Agent-E, a hierarchical agentic framework capable of self-evolution through past experience. Mobile-Agent-E adopts a multi-level communication protocol for reasoning, perception, and error recovery, explicitly separating high-level planning from low-level action decisions. It also introduces a novel self-evolution module that maintains a persistent long-term memory comprising Tips and Shortcuts, enabling continual refinement of task performance and efficiency. To bridge the gap in existing benchmarks for complex, open-ended tasks, we further present a new benchmark—Mobile-Eval-E—alongside a new evaluation metric, the Satisfaction Score. Empirical results show that Mobile-Agent-E achieves a 22% absolute improvement over previous state-of-the-art approaches across three LMM backbones. We also provide a comprehensive analysis of the impact of the self-evolution mechanism and outline promising directions for future work.
Archival Option: The authors of this submission want it to appear in the archival proceedings.
Submission Number: 19
Loading