Abstract: With the emergence and continuous prosperity of large language models (LLMs), artificial intelligence (AI) agents have experienced rapid advancements. Most mobile AI agents merely imitate human operations, executing actions based on the human user interface (UI). The restricted input impairs the efficiency and accuracy of mobile tasks. We propose an unexplored approach: learning from the source code. Source code is the plain interaction for mobile applications, which can be used to enhance the UI understanding of mobile agents, improve action execution accuracy, and reduce the average action completion steps. The implementation of the agent prototype is preliminary evaluated on 5 open-source applications and 22 tasks, reducing the average number of task completion steps by 54%.
Loading