RILe: Reinforced Imitation Learning

24 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Reinforcement Learning, Imitation Learning, Deep Reinforcement Learning
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: RILe: Reinforced Imitation Learning is introduced as a scalable imitation learning framework by combining the strengths of imitation learning and inverse reinforcement learning.
Abstract: Learning to imitate behaviors from a limited set of expert trajectories is a promising way to acquire a policy. In imitation learning (IL), an expert policy is trained directly from data in an efficient way, but requires vast amounts of data. On the other hand, inverse reinforcement learning (IRL) deduces a reward function from expert data and then learns a policy with reinforcement learning via this reward function. Although this mitigates the data requirement of imitation learning, IRL approaches suffer from efficiency issues because of sequential learning of the reward function and the policy. In this paper, we combine the strengths of imitation learning and inverse reinforcement learning and introduce RILe: Reinforced Imitation Learning. Our novel dual-agent framework enables joint training of a teacher agent and a student agent. The teacher agent learns the reward function from expert data. It observes the student agent’s behavior and provides it with a reward signal. At the same time the student agent learns a policy by using reward signals given by the teacher. Training the student and the teacher jointly in a single learning process offers scalability and efficiency while learning the reward function helps to alleviate data-sensitivity. Experimental comparisons in reinforcement learning benchmarks against imitation learning baselines highlight the superior performance offered by RILe particularly when the number of expert trajectories is limited.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
Supplementary Material: zip
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 9178
Loading