Rethinking ValueDice: Does It Really Improve Performance?

Anonymous

Published: 28 Mar 2022, Last Modified: 22 Oct 2023BT@ICLR2022Readers: Everyone
Keywords: reinforcement learning, imitation learning, ValueDice, adversarial learning
Abstract: Since the introduction of GAIL, adversarial imitation learning (AIL) methods attract lots of research interests. Among these methods, ValueDice has achieved significant improvements: it beats the classical approach Behavioral Cloning (BC) in the offline setting, and it requires fewer interactions than GAIL in the online setting. Are these improvements benefited from more advanced algorithm designs? We answer this question with the following conclusions. First, we show that the ValueDice could reduce to BC in the offline setting. Second, we verify that overfitting exists and regularization matters. Specifically, we demonstrate that with weight decay, BC also nearly matches the expert performance as ValueDice does. The first two claims explain the superior offline performance of ValueDice. Third, we establish that ValueDice does not work at all when the expert trajectory is subsampled. Instead, the mentioned success holds when the expert trajectory is complete, in which ValueDice is closely related to BC that performs well as mentioned. Finally, we discuss the implications of our research for imitation learning studies beyond ValueDice.
Submission Full: zip
Blogpost Url: yml
ICLR Paper: https://openreview.net/forum?id=Hyg-JC4FDr
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/arxiv:2202.02468/code)
3 Replies

Loading