Abstract: We consider Imitation Learning with dynamics variation between the expert demonstration (source domain) and the environment (target domain). Based on the popular framework of Adversarial Imitation Learning, we propose a novel algorithm – Dynamics Adapted Imitation Learning (DYNAIL), which incorporates the dynamics variation into the state-action occupancy measure matching as a regularization term. The dynamics variation is modeled by a pair of classifiers to distinguish between source dynamics and target dynamics. Theoretically, we provide an upper bound on the divergence between the learned policy and expert demonstrations in the source domain. Our error bound only depends on the expectation of the discrepancy between the source and target dynamics for the optimal policy in the target domain. The experiment evaluation validates that our method achieves superior results on high dimensional continuous control tasks, compared to existing imitation learning methods
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: Please find the changes below in the new manuscript:
1. Section 3.2: Add the explicit form of discriminator to paragraph 'Training the discriminator'.
2. Section 4: Add more explicit limitation of Theorem 4.1 to Remark 4.1.
3. Section 5: Add a broken humanoid task to the experiments.
4. Section 5: Add a new baseline GAIFO to all the experiments.
5. Section 5: Change all the x-axis from 'iterations' to 'steps' in all the figures about return curves.
6. Appendix C.3: Add a detailed description of the broken humanoid task.
Code: https://github.com/Panda-Shawn/DYNAIL
Assigned Action Editor: ~Lihong_Li1
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Submission Number: 1021
Loading