Abstract: Mobile task automation is an emerging technology that leverages AI to automatically execute routine tasks by users' commands on mobile devices like Android, thus enhancing efficiency and productivity.
While large language models (LLMs) excel at general mobile tasks through training on massive datasets, they struggle with app-specific workflows.
To solve this problem, we designed UI Map, a structured representation of target app's UI information.
We further propose a UI Map-guided LLM-based approach UICompass to automate mobile tasks.
Specifically, UICompass first leverages static analysis and LLMs to automatically build UI Map from either source codes of apps or byte codes (\emph{i.e.,} APK packages).
During task execution, UICompass mines the task-relevant information from UI Map to feed into the LLMs, generate a planned paths, and adaptively adjust the path based on the actual app state and action history.
Experimental results demonstrate that UICompass achieves a 15.87\% higher task executing success rate than SOTA approaches.
Even when only APK is available, UICompass maintains superior performance, demonstrating its applicability to closed-source apps.
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: Task Automation, Large Language Models, App Analysis
Contribution Types: Publicly available software and/or pre-trained models
Languages Studied: English
Submission Number: 6549
Loading