Abstract: In this paper, we present the Sample Efficient Social Navigation from Observation (SESNO) algorithm that efficiently learns socially-compliant navigation policies from observations of human trajectories. SESNO is an inverse reinforcement learning (IRL)-based algorithm that learns from human trajectory observations without knowledge of their actions. We improve the sample-efficiency over previous IRL-based methods by introducing a shared experience replay buffer that allows reuse of past trajectory experiences to estimate the policy and the reward. We evaluate SESNO using publicly available pedestrian motion data sets and compare its performance to related baseline methods in the literature. We show that SESNO yields performance superior to existing baselines while dramatically improving the sample complexity by using as few as a hundredth of the samples required by existing baselines.
External IDs:dblp:conf/iros/BaghiKHJD22
Loading