Multi-Stage Manipulation with Demonstration-Augmented Reward, Policy, and World Model Learning

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: We propose a framework multi-stage manipulation tasks with sparse rewards and visual inputs; our framework combines learned dense rewards, model-based RL, a bi-phasic training scheme, and a small number of demonstrations.
Abstract: Long-horizon tasks in robotic manipulation present significant challenges in reinforcement learning (RL) due to the difficulty of designing dense reward functions and effectively exploring the expansive state-action space. However, despite a lack of dense rewards, these tasks often have a multi-stage structure, which can be leveraged to decompose the overall objective into manageable sub-goals. In this work, we propose DEMO³, a framework that exploits this structure for efficient learning from visual inputs. Specifically, our approach incorporates multi-stage dense reward learning, a bi-phasic training scheme, and world model learning into a carefully designed demonstration-augmented RL framework that strongly mitigates the challenge of exploration in long-horizon tasks. Our evaluations demonstrate that our method improves data-efficiency by an average of 40% and by 70% on particularly difficult tasks compared to state-of-the-art approaches. We validate this across 16 sparse-reward tasks spanning four domains, including challenging humanoid visual control tasks using as few as five demonstrations.
Lay Summary: Training robots to complete long, complex tasks, like picking up an object and placing it in a specific spot, is incredibly difficult. One major reason is that they usually receive very little feedback while learning, making it hard for them to know if they’re making progress. Our work introduces DEMO³, a learning method that teaches robots more effectively by giving them just a few example demonstrations and then helping them break big tasks into smaller steps. Each step provides feedback, which makes learning faster and more reliable. DEMO³ uses this step-based feedback to train the robot’s behavior, its understanding of the world, and its ability to evaluate progress, all at once. We tested our approach on 16 robot tasks across different environments. DEMO³ consistently outperformed other methods, especially in the most challenging tasks. It also worked well even with as few as five example demonstrations. This approach could make it easier and faster to train robots for real-world jobs, from household assistance to industrial automation.
Link To Code: https://github.com/adrialopezescoriza/demo3
Primary Area: Reinforcement Learning->Deep RL
Keywords: Reinforcement Learning, Learning from Demonstrations, Robotics, Manipulation
Submission Number: 1910
Loading