Universal Value Density Estimation for Imitation Learning and Goal-Conditioned Reinforcement Learning
Keywords: Imitation Learning, Reinforcement Learning, Universal Value Functions
Abstract: This work considers two distinct settings: imitation learning and goal-conditioned reinforcement learning. In either case, effective solutions require the agent to reliably reach a specified state (a goal), or set of states (a demonstration). Drawing a connection between probabilistic long-term dynamics and the desired value function, this work introduces an approach that utilizes recent advances in density estimation to effectively learn to reach a given state. We develop a unified view on the two settings and show that the approach can be applied to both. In goal-conditioned reinforcement learning, we show it to circumvent the problem of sparse rewards while addressing hindsight bias in stochastic domains. In imitation learning, we show that the approach can learn from extremely sparse amounts of expert data and achieves state-of-the-art results on a common benchmark.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
One-sentence Summary: We use density estimation to learn UVFA efficiently. We use this UVFA for GCRL as well as to match an expert's state-action distribution in imitation learning.
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/arxiv:2002.06473/code)
Reviewed Version (pdf): https://openreview.net/references/pdf?id=mfoNM2IYlw
12 Replies
Loading