Generalize and Guide: Decomposing Rewards for Few-Shot Inverse Reinforcement Learning

17 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Inverse Reinforcement Learning, Deep Reinforcement Learning, Few-shot Learning
TL;DR: We tackle the problem of few-shot IRL in tasks with wide variations utilizing multi-task demonstrations, where an agent must learn a new task from limited demonstrations by leveraging data from other related tasks.
Abstract: Inverse reinforcement learning (IRL) provides a powerful framework for learning from demonstrations. However, many realistic tasks include natural variations (i.e. a cleaning robot in a house with different furniture configurations), making it impractical to provide enough demonstrations to fully specify the task in every scenario. We tackle the problem of few-shot IRL with multi-task demonstrations, where an agent must learn a new task from limited demonstrations by leveraging data from other related tasks. Unlike prior methods that rely on expensive meta-training or are restricted to offline imitation, our approach learns a reward function that can be directly optimized through online interaction. We introduce Multi-task discriminator Proximity-guided IRL (MPIRL), a novel method that learns a generalizable and informative reward function for effective few-shot IRL. Our key insight is to decompose the reward into two components: (1) a multi-task discriminator that recognizes and rewards expert behavior in different task variations, and (2) a dense, proximity-to-expert reward that guides the agent in non-expert states. This composite reward structure enables effective policy optimization even when demonstration data is limited. We demonstrate the effectiveness of our method on multiple challenging navigation and manipulation tasks.
Primary Area: reinforcement learning
Submission Number: 9955
Loading