Keywords: Imitation Learning, Sliced Wasserstein, Optimal Transport
Abstract: Imitation learning methods allow to train reinforcement learning policies by way
of minimizing a divergence measure between the state occupancies of the expert
agent and the novice policy. Alternatively, a true metric in the space of probability
measures can be used by invoking the optimal transport formalism. In this work,
we propose a novel imitation learning method based on the generalized form of
the sliced Wasserstein distance, which presents a number of computational and
sample complexity benefits compared to existing imitation learning approaches.
We derive a per-state reward function based on the approximate differential of the
$\mathcal{SW}2$ distance which allows the use of standard forward RL methods for policy
optimization. We demonstrate that the proposed method exhibits state-of-the-art
performance compared to established imitation learning frameworks on a number
of benchmark tasks from the MuJoCo robotic locomotion suite.
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 8238
Loading