Self-Guided Plan Extraction for Instruction-Following Tasks with Goal-Conditional Reinforcement Learning

ICLR 2026 Conference Submission24913 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Instruction Following; Reinforcement Learning; Multimodal RL
TL;DR: A self-improving framework couples language-model plan generation with reinforcement learning feedback to achieve robust, generalizable instruction following without predefined subtasks.
Abstract: We introduce a framework for instruction-following tasks. Unlike prior methods that rely on predefined subtasks, our approach enables a language model to generate and refine high-level plans through a self-learning mechanism, reducing the need for manual dataset annotation. The method involves iterative co-training: an RL agent is trained to follow the generated plans, while the language model adapts and modifies these plans based on RL feedback and preferences. This creates a feedback loop where both the agent and the planner improve jointly. We validate the framework in environments with rich dynamics and stochasticity. Results show that our agents adhere to instructions more strictly than baseline methods, while also demonstrating strong generalization to previously unseen instructions.
Primary Area: applications to robotics, autonomy, planning
Submission Number: 24913
Loading