Keywords: behavior cloning, expert-driven learning
Abstract: We present a simple, yet powerful data-augmentation technique to enable data-efficient learning from parametric experts. Whereas behavioral cloning refers to learning from samples of an expert, we focus here on what we refer to as the policy cloning setting which allows for offline queries of an expert or expert policy. This setting arises naturally in a number of problems, especially as a component of other algorithms. We achieve a very high level of data efficiency in transferring behavior from an expert to a student policy for high Degrees of Freedom (DoF) control problems using our augmented policy cloning (APC) approach, which combines conventional image-based data augmentation to build invariance to image perturbations with an expert-aware offline data augmentation approach that induces appropriate feedback-sensitivity in a region around expert trajectories. We show that our method increases data-efficiency of policy cloning, enabling transfer of complex high-DoF behaviours from just a few trajectories, and we also show benefits of our approach in the context of algorithms in which policy cloning is a constituent part.
Code Of Conduct: I certify that all co-authors of this work have read and commit to adhering to the NeurIPS Statement on Ethics, Fairness, Inclusivity, and Code of Conduct.
TL;DR: Data-augmentation method for expert-driven learning which produces new (virtual) states and actions and significantly improves data-efficiency.
Supplementary Material: pdf
14 Replies
Loading