Keywords: Agent; Prompt
Abstract: Many existing methods for operating system (OS) agents focus on predicting the next action based on the current state, which constructs a predefined task execution pipeline. While these methods demonstrate promising performance, reliance on state cognition modules like detector or recognizer could impede execution efficiency, particularly in long-horizon tasks with intricate action trajectories.
Recognizing the remarkable accuracy of large language models (LLMs) in processing short instructions, this paper proposes the \textbf{ActionFiller} framework.
The goal is to integrate easily executable short tasks into longer, cohesive tasks using fill-in-the-blank prompts, thereby minimizing redundant operations and enhancing efficiency.
ActionFiller employs two types of action-oriented fill-in-the-blank prompts: one designed for subtasks and another for specific actions. To generate subtask prompts, we introduce a Foresight Optimization Agent (FOA) that constructs an initial prompt by referencing past short tasks. It then fills in the unreferenced parts with detailed prompts generated by a planning agent, effectively retaining valuable past experiences.
Next, an Action Template Agent (ATA) generates action prompts for each subtask. This process yields three distinct types of action prompts: 1) executable action sequences, 2) non-executable action sequences with prompt parameters, and 3) pure text descriptions.
To execute the action prompts effectively, we propose the CohesiveFlow method, which optimizes the second and third types of prompts by leveraging the cognitive state of the environment. Inspired by masked language modeling, the CohesiveFlow agent integrates the current environmental state with previously executed action sequences to update parameters and text descriptions, ensuring both feasibility and effectiveness in execution.
To validate the efficacy of our approach for long-horizon instructions, we introduce a new benchmark called \textbf{EnduroSeq} and conduct experiments using the WinBench short instruction dataset. The results demonstrate that ActionFiller significantly enhances task completion rates and execution efficiency, offering a novel solution for the application of intelligent agents in complex environments.
Primary Area: applications to robotics, autonomy, planning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 5744
Loading