Dynamic Planning for Graphical User Interface Automation with LLM Agents

Anonymous

Dynamic Planning for Graphical User Interface Automation with LLM Agents

Anonymous

16 Feb 2024ACL ARR 2024 February Blind SubmissionReaders: Everyone

Abstract: The advent of large language models (LLMs) has spurred considerable interest in advancing autonomous agents, empowering them to tackle real-world tasks by perceiving distinct environments, formulating plans, and executing actions. An intriguing application of these agents is within smartphone graphical user interfaces (GUIs). Upon receiving a task goal, the agent generates step-by-step plans and engages in iterative interactions until task completion. However, it remains an open challenge how to generate effective plans to guide the action prediction. Current studies often confine themselves to static plans or lack specific plans entirely. Given that the environment evolves following action execution, the imperative is to adapt plans dynamically based on environmental feedback and action history. To address the challenge, we propose DP-Agent, a novel approach designed to cultivate dynamic planning in agents. DP-Agent involves the dynamic adjustment of planning based on feedback from the environment and interaction history. Experimental results reveal that DP-Agent exhibits superior performance, surpassing the widely adopted GPT-4V baseline by +8.81% (35.58% $\rightarrow$ 44.39%) on the AITW benchmark dataset. Our analysis highlights the efficacy of dynamic planning in not only enhancing action prediction accuracy but also in adapting to previously unfamiliar tasks.

Paper Type: long

Research Area: NLP Applications

Contribution Types: NLP engineering experiment

Languages Studied: English, Python, HTML

0 Replies

Loading