Efficient Fine-Tuning of Behavior Cloned Policies with Reinforcement Learning from Limited Demonstrations

Published: 10 Oct 2024, Last Modified: 28 Oct 2024FITML 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Behavior Cloning, Reinforcement Learning, Sparse Rewards
Abstract: Behavior cloning (BC) is a supervised learning technique in which an agent mimics expert behavior based on demonstration data. While BC is widely applied in robotics due to its simplicity, it is constrained by challenges such as dataset bias, limited adaptability, and an inability to outperform the expert. In this study, we introduce an efficient fine-tuning approach that combines BC with reinforcement learning (RL), leveraging limited demonstrations to mitigate these limitations. Our approach refines the pre-trained BC policy by incorporating a world model to facilitate synthetic rollouts for planning and policy optimization, and by balancing data sampling between expert-provided demonstrations and agent-driven online interactions. We perform experiments in environments with high-dimensional image observations and employ sparse reward signals in place of human-engineered dense reward functions. The experimental results demonstrate that our method significantly improves sample efficiency, enabling the successful learning of complex robotic manipulation tasks within a restricted budget of 100K environment steps.
Submission Number: 44
Loading