Turning the Dial: Bridging Behavior Cloning and Reinforcement Learning via Timestep Modulation

Published: 08 May 2026, Last Modified: 08 May 2026ICRA 2026 Workshop RL4IL OralEveryoneRevisionsCC BY 4.0
Keywords: imitation learning, reinforcement learning, pre-training for post-training
Abstract: Fine-tuning pre-trained robot policies with reinforcement learning (RL) is promising, but standard behavior cloning (BC) produces narrow, overconfident action distributions that generalize poorly and limit downstream RL improvement. We present a unified framework for bridging BC pre-training and RL fine-tuning. Our pre-training method, Context-Smoothed Pre-training (CSP), injects forward-diffusion noise into policy inputs, enabling a continuous spectrum between precise conditional imitation and broader action coverage. For efficient RL fine-tuning, we introduce Timestep-Modulated Reinforcement Learning (TMRL), which enables the agent to dynamically adjust conditioning strength via diffusion timestep modulation to control exploration. Across diverse settings, CSP integrates seamlessly with arbitrary policy inputs, from states to 3D pointclouds, and with image-input vision-language-action policies. TMRL with CSP significantly improves RL sample efficiency over prior approaches. Notably, TMRL enables successful real-world fine-tuning on manipulation tasks in under 1 hour.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 29
Loading