Learning Self-Correctable Policies and Value Functions from Demonstrations with Negative SamplingDownload PDFOpen Website

2020 (modified: 26 Mar 2022)ICLR 2020Readers: Everyone
Abstract: We introduce a notion of conservatively-extrapolated value functions, which provably lead to policies that can self-correct to stay close to the demonstration states, and learn them with a novel negative sampling technique.
0 Replies

Loading