Abstract: Often the desired behaviour of an agent can be represented by means of a rewardfunction in a specific state action space. However, creating a reward functionby hand that is effective with multiple goals is often extremely time-consuming.Techniques like Hindsight Experience Replay (HER) have demonstrated how anagent is able to learn policies able to reach many goals, without the need of a reward,while Generative Adversarial Imitation Learning (GAIL) is able to learn morequickly, but is limited by the capability of the demonstrator. goalGAIL combinesthe two in order to perform sample-efficiently, but still allow for surpassing ademonstration.
Track: Replicability
NeurIPS Paper Id: https://openreview.net/forum?id=BJgjpSSeIS
3 Replies
Loading