Teachable Reinforcement Learning via Advice DistillationDownload PDF

May 21, 2021 (edited Oct 26, 2021)NeurIPS 2021 PosterReaders: Everyone
  • Keywords: Reinforcement Learning, Human in the Loop RL
  • TL;DR: Enabling agents to interpret human in the loop advice to learn new tasks quickly
  • Abstract: Training automated agents to perform complex behaviors in interactive environments is challenging: reinforcement learning requires careful hand-engineering of reward functions, imitation learning requires specialized infrastructure and access to a human expert, and learning from intermediate forms of supervision (like binary preferences) is time-consuming and provides minimal information per human intervention. Can we overcome these challenges by building agents that learn from rich, interactive feedback? We propose a new supervision paradigm for interactive learning based on teachable decision-making systems, which learn from structured advice provided by an external teacher. We begin by introducing a class of human-in-the-loop decision making problems in which different forms of human provided advice signals are available to the agent to guide learning. We then describe a simple policy learning algorithm that first learns to interpret advice, then learns from advice to target tasks in the absence of human supervision. In puzzle-solving, navigation, and locomotion domains, we show that agents that learn from advice can acquire new skills with significantly less human supervision required than standard reinforcement or imitation learning systems.
  • Code Of Conduct: I certify that all co-authors of this work have read and commit to adhering to the NeurIPS Statement on Ethics, Fairness, Inclusivity, and Code of Conduct.
  • Code: https://github.com/AliengirlLiv/teachable
20 Replies