Mo' States Mo' Problems: Emergency Stop Mechanisms from Observation

Samuel Ainsworth; Matt Barnes; Siddhartha Srinivasa

Mo' States Mo' Problems: Emergency Stop Mechanisms from Observation

Samuel Ainsworth, Matt Barnes, Siddhartha Srinivasa

06 Sept 2019 (modified: 05 May 2023)NeurIPS 2019Readers: Everyone

Abstract: We study reinforcement learning with access to state observations from a demonstrator in addition to a reward signal. In this setting the demonstrator only supplies sequences of observations, and we leverage these samples to improve the learning efficiency of the agent. Our key insight is that in most environments expert policies only visit a tiny fraction of the total available states. We develop a simple technique, e-stops, to exploit this phenomenon. Using e-stops significantly improves sample complexity by reducing the amount of required exploration, while retaining a performance bound that trades off the rate of convergence with a small asymptotic suboptimality gap. We analyze the regret behavior of e-stops and present empirical results demonstrating that our reset mechanism provides order-of-magnitude speedups over classic reinforcement learning methods.

CMT Num: 8723

Code Link: https://github.com/samuela/e-stops

0 Replies

Loading