Skill Reinforcement Learning and Planning for Open-World Long-Horizon Tasks

TMLR Paper2284 Authors

23 Feb 2024 (modified: 25 Apr 2024)Decision pending for TMLREveryoneRevisionsBibTeX
Abstract: Developing autonomous agents with multi-task capabilities in open-world environments has been a longstanding goal of AI research. Without human demonstrations, learning to accomplish long-horizon tasks in a large open-world environment with reinforcement learning (RL) is extremely inefficient. To tackle this challenge, we convert the multi-task learning problem into learning basic skills and planning over the skills. We employ RL with intrinsic rewards, enabling the agent to acquire a set of basic skills. These skills can be reused and chained together to solve diverse long-horizon tasks. Given the challenge of exploring large open-world environments using RL, we propose a novel Finding-skill that aims at finding target items of subsequent skills and providing effective state initialization for these skills. In skill planning, we utilize the prior knowledge in Large Language Models (LLMs) to construct a skill graph that depicts the relationships between skills. When solving a task, at each stage, the agent searches for a path on the skill graph and executes the first skill. In the popular open-world game Minecraft, our method accomplishes 40 diverse tasks, where many tasks require sequentially executing more than 10 skills. Our method outperforms baselines by a large margin and is the most sample-efficient demonstration-free RL method to solve Minecraft Tech Tree tasks.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Yonatan_Bisk1
Submission Number: 2284
Loading