IN-RIL: Interleaved Reinforcement and Imitation Learning for Policy Fine-Tuning

ICLR 2026 Conference Submission16947 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Reinforcement Learning, Imitation Learning, Robot Learning
TL;DR: We introduce INterleaved Reinforcement learning and Imitation Learning, for policy fine-tuning, which can benefit from the stability of IL and the expert guidance, and exploratory strength of RL
Abstract: Imitation learning (IL) and reinforcement learning (RL) offer complementary strengths for robot learning, and yet each has severe limitations when used in isolation. Recent studies have proposed hybrid approaches to integrate IL with RL, but still face major challenges such as over-regularization and poor sample efficiency. Thus motivated, we develop IN-RIL, \textbf{IN}terleaved \textbf{R}einforcement learning and \textbf{I}mitation \textbf{L}earning, for policy fine-tuning, which periodically injects IL updates after multiple RL updates. In essence, IN-RIL leverages `alternating optimization' to exploit the strengths of both IL and RL without overly constraining the policy learning, and hence can benefit from both the stability of IL and the expert-guided exploration of RL accordingly. Since IL and RL involve different optimization objectives, we devise gradient separation mechanisms to prevent their interference. Furthermore, our rigorous analysis sheds light on how interleaving IL with RL stabilizes learning and improves iteration efficiency. We conduct extensive experiments on Robomimic, FurnitureBench, and Gym, and demonstrate that IN-RIL as a general plug-in compatible with various state-of-the-art RL algorithms, can improve RL sample efficiency, and mitigate performance collapse.
Supplementary Material: zip
Primary Area: applications to robotics, autonomy, planning
Submission Number: 16947
Loading