GUI-explorer: Autonomous Exploration and Mining of Transition-aware Knowledge for GUI Agent

GUI-explorer: Autonomous Exploration and Mining of Transition-aware Knowledge for GUI Agent

ACL ARR 2025 February Submission3724 Authors

15 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: GUI automation faces critical challenges in dynamic environments. MLLMs suffer from two key issues: misinterpreting UI components and outdated knowledge. Traditional fine-tuning methods are costly for app-specific knowledge updates. We propose GUI-explorer, a training-free GUI agent that incorporates two fundamental mechanisms: $\textbf{(1) Autonomous Exploration of Function-aware Trajectory}$. To comprehensively cover all application functionalities, we design a $\textbf{Function-aware Task Goal Generator}$ that automatically constructs exploration goals by analyzing GUI structural information (e.g., screenshots and activity hierarchies). This enables systematic exploration to collect diverse trajectories. $\textbf{(2) Unsupervised Mining of Transition-aware Knowledge}$. To establish precise screen-operation logic, we develop a $\textbf{Transition-aware Knowledge Extractor}$ that extracts effective screen-operation logic through unsupervised analysis the state transition of structured interaction triples (observation, action, outcome). This eliminates the need for human involvement in knowledge extraction. With a task success rate of 53.7\% on SPA-Bench and 47.4\% on AndroidWorld, GUI-explorer shows significant improvements over SOTA agents. It requires no parameter updates for new apps. All data and code will be publicly available on Github after acceptance.

Paper Type: Long

Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond

Research Area Keywords: GUI Agent,GUI automation

Contribution Types: Approaches to low-resource settings, Publicly available software and/or pre-trained models

Languages Studied: English

Submission Number: 3724

Loading