Fight fire with fire: countering bad shortcuts in imitation learning with good shortcutsDownload PDF

Published: 28 Jan 2022, Last Modified: 13 Feb 2023ICLR 2022 SubmittedReaders: Everyone
Abstract: When operating in partially observed settings, it is important for a control policy to fuse information from a history of observations. However, a naive implementation of this approach has been observed repeatedly to fail for imitation-learned policies, often in surprising ways, and to the point of sometimes performing worse than when using instantaneous observations alone. We observe that behavioral cloning policies acting on single observations and observation histories each have their strengths and drawbacks, and combining them optimally could achieve the best of both worlds. Motivated by this, we propose a simple model combination approach inspired by human decision making: we first compute a coarse action based on the instantaneous observation, and then refine it into a final action using historical information. Our experiments show that this outperforms all baselines on CARLA autonomous driving from images and various MuJoCo continuous control tasks.
9 Replies

Loading