UICOMPASS: UI Manual Guided Mobile Task Automation via Adaptive Instruction Replanning

UICOMPASS: UI Manual Guided Mobile Task Automation via Adaptive Instruction Replanning

ACL ARR 2025 February Submission6435 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Large language model(LLM)-based agents have been widely applied in the field of mobile task automation. However, LLMs, which are proficient in general task execution practices, often struggle to execute tasks correctly on specific applications due to a lack of application-specific knowledge, leading to confusion and errors. Although existing methods use exploration-memory mechanisms to mitigate this issue, excessive exploration on user devices is unacceptable, and these mechanisms still struggle to handle tasks effectively. In this work, we propose a method for assisting agents in mobile task completion using a User Interface Manual, called UICompass. Specifically, it first automates the extraction of the User Interface Manual from the source code, which describes the application's interface and interaction logic. During execution, it analyzes the User Interface Manual to generate simulation paths for the given task and adaptively adjusts the execution path based on the actual application state. Experiments show that UICompass achieves state-of-the-art performance on the DroidTask dataset, with a success rate improvement of 14.48%, and a reduction in the length of execution paths.

Paper Type: Long

Research Area: NLP Applications

Research Area Keywords: Human-Computer Interaction

Contribution Types: Publicly available software and/or pre-trained models

Languages Studied: English

Submission Number: 6435

Loading