Abstract: Mobile agents are essential for automating tasks in complex and dynamic mobile environments. As foundation models evolve, they offer increasingly powerful capabilities for understanding and generating natural language, enabling real-time adaptation and processing of multimodal data. This survey provides a comprehensive review of mobile agent technologies, with a focus on recent advancements in foundation models. Our analysis begins by exploring key representative works in mobile benchmarks and interactive environments, aiming to fully understand the research focuses and their limitations. We then introduce the core components and categorize these advancements into two main approaches: prompt-based methods, which utilize large language models (LLMs) for instruction-based task execution, and training-based methods, which fine-tune multimodal models for mobile-specific applications. By discussing key challenges and outlining future research directions, this survey offers valuable insights for advancing mobile agent technologies.
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: Mobile Agents, Multimodal, Survey
Contribution Types: Surveys
Languages Studied: English
Submission Number: 1023
Loading