DROID: Learning from Offline Heterogeneous Demonstrations via Reward-Policy Distillation

Sravan Jayanthi; Letian Chen; Nadya Balabanska; Van Duong; Erik Scarlatescu; Ezra Ameperosa; Zulfiqar Haider Zaidi; Daniel Martin; Taylor Keith Del Matto; Masahiro Ono; Matthew Gombolay

DROID: Learning from Offline Heterogeneous Demonstrations via Reward-Policy Distillation

Sravan Jayanthi, Letian Chen, Nadya Balabanska, Van Duong, Erik Scarlatescu, Ezra Ameperosa, Zulfiqar Haider Zaidi, Daniel Martin, Taylor Keith Del Matto, Masahiro Ono, Matthew Gombolay

Published: 30 Aug 2023, Last Modified: 25 Oct 2023CoRL 2023 PosterReaders: Everyone

Keywords: Learning from Heterogeneous Demonstration, Network Distillation, Offline Imitation Learning

Abstract: Offline Learning from Demonstrations (OLfD) is valuable in domains where trial-and-error learning is infeasible or specifying a cost function is difficult, such as robotic surgery, autonomous driving, and path-finding for NASA's Mars rovers. However, two key problems remain challenging in OLfD: 1) heterogeneity: demonstration data can be generated with diverse preferences and strategies, and 2) generalizability: the learned policy and reward must perform well beyond a limited training regime in unseen test settings. To overcome these challenges, we propose Dual Reward and policy Offline Inverse Distillation (DROID), where the key idea is to leverage diversity to improve generalization performance by decomposing common-task and individual-specific strategies and distilling knowledge in both the reward and policy spaces. We ground DROID in a novel and uniquely challenging Mars rover path-planning problem for NASA's Mars Curiosity Rover. We also curate a novel dataset along 163 Sols (Martian days) and conduct a novel, empirical investigation to characterize heterogeneity in the dataset. We find DROID outperforms prior SOTA OLfD techniques, leading to a $26\%$ improvement in modeling expert behaviors and $92\%$ closer to the task objective of reaching the final destination. We also benchmark DROID on the OpenAI Gym Cartpole environment and find DROID achieves $55\%$ (significantly) better performance modeling heterogeneous demonstrations.

Student First Author: yes

Supplementary Material: zip

Instructions: I have read the instructions for authors (https://corl2023.org/instructions-for-authors/)

Publication Agreement: pdf

Poster Spotlight Video: mp4

18 Replies

Loading