Keywords: Offline Imitation Learning, Preference-based Reinforcement Learning, Large Language Model Alignment, Data Efficiency
TL;DR: POIL is a novel offline imitation learning method inspired by preference optimization in language models. It delivers superior or competitive performance against SOTA methods in Mujoco task by eliminating adversarial training and reference models.
Abstract: Imitation learning (IL) enables agents to learn policies by mimicking expert demonstrations.
While online IL methods require interaction with the environment, which is costly, risky, or impractical, offline IL allows agents to learn solely from expert datasets without any interaction with the environment.
In this paper, we propose Preference Optimization for Imitation Learning (POIL), a novel approach inspired by preference optimization techniques in large language model alignment.
POIL eliminates the need for adversarial training and reference models by directly comparing the agent's actions to expert actions using a preference-based loss function.
We evaluate POIL on MuJoCo control tasks under two challenging settings: learning from a single expert demonstration and training with different dataset sizes (100\%, 10\%, 5\%, and 2\%) from the D4RL benchmark.
Our experiments show that POIL consistently delivers superior or competitive performance against state-of-the-art methods in the past, including Behavioral Cloning (BC), IQ-Learn, DMIL, and O-DICE, especially in data-scarce scenarios, such as using one expert trajectory or as little as 2\% of the full expert dataset.
These results demonstrate that POIL enhances data efficiency and stability in offline imitation learning, making it a promising solution for applications where environment interaction is infeasible and expert data is limited.
Supplementary Material: zip
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 6549
Loading