Self-Guided Plan Extraction for Instruction-Following Tasks with Goal-Conditional Reinforcement Learning
Keywords: Instruction Following; Reinforcement Learning; Multimodal RL
TL;DR: A self-improving framework couples language-model plan generation with reinforcement learning feedback to achieve robust, generalizable instruction following without predefined subtasks.
Abstract: We introduce a framework for instruction-following tasks. Unlike prior methods that rely on predefined subtasks, our approach enables a language model to generate and refine high-level plans through a self-learning mechanism, reducing the need for manual dataset annotation. The method involves iterative co-training: an RL agent is trained to follow the generated plans, while the language model adapts and modifies these plans based on RL feedback and preferences. This creates a feedback loop where both the agent and the planner improve jointly. We validate the framework in environments with rich dynamics and stochasticity. Results show that our agents adhere to instructions more strictly than baseline methods, while also demonstrating strong generalization to previously unseen instructions.
Primary Area: applications to robotics, autonomy, planning
Submission Number: 24913
Loading