Keywords: Learning From Demonstration, Interactive Imitation Learning, World Model, Model Based
TL;DR: Using human demonstrations to train world models rather than agent policies or reward functions results in more reliable policies that are performant earlier in training.
Abstract: Previous work in interactive reinforcement learning considers human behavior directly in agent policy learning, but this requires estimating the distribution of human behavior over many samples to prevent bias. Our work shows that model-based systems can avoid this problem by using small amounts of human data to guide world-model learning rather than agent-policy learning. We show that this approach learns faster and produces useful policies more reliably than prior state-of-the-art. We evaluate our approach with expert human demonstrations in two environments: PinPad5, a fully observable environment which prioritizes task composition, and MemoryMaze, a partially observable environment which prioritizes exploration and memory. We show an order of magnitude speed-up in learning and reliability with only nine minutes of expert human demonstration data.
Submission Number: 236
Loading