Conflict-Averse IL-RL: Resolving Gradient Conflicts for Stable Imitation-to-Reinforcement Learning Transfer
Abstract: Reinforcement Learning (RL) and Imitation Learning (IL) offer complementary capabilities: RL can learn high-performing policies but is data-intensive, whereas IL enables rapid learning from demonstrations but is limited by the demonstrator's quality. Combining them offers the potential for improved sample efficiency in learning high-performing policies, yet naïve integrations often suffer from two fundamental issues: (1) negative transfer, where optimizing the IL loss hinders effective RL fine-tuning, and (2) gradient conflict, where differences in the scale or direction of IL and RL gradients lead to unstable updates.
We introduce Conflict-Averse IL--RL (CAIR), a general framework that addresses both challenges by combining two key components: (1) Loss Manipulation: an adaptive annealing mechanism utilizing a convex combination of IL and RL losses. This mechanism dynamically increases the weight of the RL loss when its gradient aligns with the IL gradient and decreases it otherwise, mitigating instabilities during the transition from IL to RL. (2) Gradient Manipulation: to further reduce conflict, we incorporate CAGrad to compute a joint gradient that balances IL and RL objectives while avoiding detrimental interference.
Under standard trust-region assumptions, CAIR guarantees monotonic improvement in the expected return when the loss weights are annealed monotonically. Our empirical study evaluates CAIR on four sparse-reward MuJoCo domains, where pure RL algorithms typically struggle. Compared against relevant hybrid RL baselines, CAIR improves sample efficiency in three out of four domains and asymptotic performance in two, while performing comparably on the remainder. These trends are consistent across multiple combinations of IL (BC, DAgger) and RL (DDPG, SAC, PPO) methods, demonstrating the robustness of the novel framework.
Submission Type: Long submission (more than 12 pages of main content)
Assigned Action Editor: ~Zhongwen_Xu1
Submission Number: 6891
Loading