Dynamic Planning for LLM-based Graphical User Interface Automation

ACL ARR 2024 April Submission303 Authors

15 Apr 2024 (modified: 20 May 2024)ACL ARR 2024 April SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract:

The advent of large language models (LLMs) has spurred considerable interest in advancing autonomous LLMs-based agents, for example, smartphone graphical user interface (GUI). Upon receiving a task goal, the agent typically simulates the operation actions of humans depending on an environmental GUI until the task is executed. However, the advanced LLMs-based agent takes actions directly without paying attention to environment feedback and execution history. Thus, the lack of historical steps makes the agent take actions statically during task execution, which may further hinder the improvement of the agent. To address this issue, we propose a novel Dynamic Planning of Thoughts (D-PoT) for the LLMs-based agent. Particularly, D-PoT involves the dynamic adjustment of planning based on the execution history. Experimental results on the AITW benchmark dataset reveal that the proposed D-PoT significantly surpassed the strong GPT-4V baseline by +11.81% (34..66% $\rightarrow$ 46.47%) accuracies. Furthermore, the quantitative analysis highlights the efficacy of dynamic planning in adapting to unseen tasks.

Paper Type: Long
Research Area: Dialogue and Interactive Systems
Research Area Keywords: embodied agents, applications
Contribution Types: NLP engineering experiment
Languages Studied: English, HTML, Python
Section 2 Permission To Publish Peer Reviewers Content Agreement: Authors grant permission for ACL to publish peer reviewers' content
Submission Number: 303
Loading