Abstract: Large language model(LLM)-based agents have been widely applied in the field of mobile task automation. However, LLMs, which are proficient in general task execution practices, often struggle to execute tasks correctly on specific applications due to a lack of application-specific knowledge, leading to confusion and errors. Although existing methods use exploration-memory mechanisms to mitigate this issue, excessive exploration on user devices is unacceptable, and these mechanisms still struggle to handle tasks effectively. In this work, we propose a method for assisting agents in mobile task completion using a User Interface Manual, called UICompass. Specifically, it first automates the extraction of the User Interface Manual from the source code, which describes the application's interface and interaction logic.
During execution, it analyzes the User Interface Manual to generate simulation paths for the given task and adaptively adjusts the execution path based on the actual application state. Experiments show that UICompass achieves state-of-the-art performance on the DroidTask dataset, with a success rate improvement of 14.48%, and a reduction in the length of execution paths.
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: Human-Computer Interaction
Contribution Types: Publicly available software and/or pre-trained models
Languages Studied: English
Submission Number: 6435
Loading