Keywords: GUI Navigation, Agent, Memory
TL;DR: We propose MAGNET, a memory-driven framework that adapts mobile app agents to UI and workflow changes for robust long-term reliability.
Abstract: Mobile App Agents powered by large foundation models represent a transformative approach to human-computer interaction, enabling autonomous task execution within dynamic mobile applications. However, the volatile nature of mobile ecosystems characterized by frequent application updates poses challenges to agent reliability and long-term viability. We identify two critical problems: UI element identification failure when visual or structural changes occur, and task logic drift when fundamental workflows are altered. To address these challenges, we propose \textbf{\modelname}, a \textbf{M}emory-driven \textbf{A}daptive a\textbf{GENT} framework, equipped with a novel dual-level memory consisting of stationary memory and procedural memory. The stationary memory maintains rich multimodal representations of UI elements, enabling robust action grounding despite interface modifications, while the procedural memory captures and adapts structured task workflows to handle logical changes in operations. This framework effectively bridges the update gap that has limited the practical deployment of mobile agents.
Comprehensive experiments demonstrate that \modelname achieves robust generalization across various in-domain scenarios and strong adaptability to novel task domains.
Primary Area: applications to robotics, autonomy, planning
Submission Number: 3541
Loading