everyone
since 16 Apr 2024">EveryoneRevisionsBibTeXCC BY 4.0
The advent of large language models (LLMs) has spurred considerable interest in advancing autonomous LLMs-based agents, for example, smartphone graphical user interface (GUI). Upon receiving a task goal, the agent typically simulates the operation actions of humans depending on an environmental GUI until the task is executed. However, the advanced LLMs-based agent takes actions directly without paying attention to environment feedback and execution history. Thus, the lack of historical steps makes the agent take actions statically during task execution, which may further hinder the improvement of the agent. To address this issue, we propose a novel Dynamic Planning of Thoughts (D-PoT) for the LLMs-based agent. Particularly, D-PoT involves the dynamic adjustment of planning based on the execution history. Experimental results on the AITW benchmark dataset reveal that the proposed D-PoT significantly surpassed the strong GPT-4V baseline by +11.81% (34..66% $\rightarrow$ 46.47%) accuracies. Furthermore, the quantitative analysis highlights the efficacy of dynamic planning in adapting to unseen tasks.