Universal Value Density Estimation for Imitation Learning and Goal-Conditioned Reinforcement Learning

Yannick Schroecker; Charles Lee Isbell

Universal Value Density Estimation for Imitation Learning and Goal-Conditioned Reinforcement Learning

Yannick Schroecker, Charles Lee Isbell

28 Sept 2020 (modified: 22 Jun 2025)ICLR 2021 Conference Blind SubmissionReaders: Everyone

Keywords: Imitation Learning, Reinforcement Learning, Universal Value Functions

Abstract: This work considers two distinct settings: imitation learning and goal-conditioned reinforcement learning. In either case, effective solutions require the agent to reliably reach a specified state (a goal), or set of states (a demonstration). Drawing a connection between probabilistic long-term dynamics and the desired value function, this work introduces an approach that utilizes recent advances in density estimation to effectively learn to reach a given state. We develop a unified view on the two settings and show that the approach can be applied to both. In goal-conditioned reinforcement learning, we show it to circumvent the problem of sparse rewards while addressing hindsight bias in stochastic domains. In imitation learning, we show that the approach can learn from extremely sparse amounts of expert data and achieves state-of-the-art results on a common benchmark.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

One-sentence Summary: We use density estimation to learn UVFA efficiently. We use this UVFA for GCRL as well as to match an expert's state-action distribution in imitation learning.

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/universal-value-density-estimation-for/code)

Reviewed Version (pdf): https://openreview.net/references/pdf?id=mfoNM2IYlw

12 Replies

Loading