TGRL: An Algorithm for Teacher Guided Reinforcement Learning

Idan Shenfeld; Zhang-Wei Hong; Aviv Tamar; Pulkit Agrawal

TGRL: An Algorithm for Teacher Guided Reinforcement Learning

Idan Shenfeld, Zhang-Wei Hong, Aviv Tamar, Pulkit Agrawal

Published: 24 Apr 2023, Last Modified: 06 Jul 2023ICML 2023 PosterEveryoneRevisions

Abstract: We consider solving sequential decision-making problems in the scenario where the agent has access to two supervision sources: $\textit{reward signal}$ and a $\textit{teacher}$ that can be queried to obtain a $\textit{good}$ action for any state encountered by the agent. Learning solely from rewards, or reinforcement learning, is data inefficient and may not learn high-reward policies in challenging scenarios involving sparse rewards or partial observability. On the other hand, learning from a teacher may sometimes be infeasible. For instance, the actions provided by a teacher with privileged information may be unlearnable by an agent with limited information (i.e., partial observability). In other scenarios, the teacher might be sub-optimal, and imitating their actions can limit the agent's performance. To overcome these challenges, prior work proposed to jointly optimize imitation and reinforcement learning objectives but relied on heuristics and problem-specific hyper-parameter tuning to balance the two objectives. We introduce Teacher Guided Reinforcement Learning (TGRL), a principled approach to dynamically balance following the teacher's guidance and leveraging RL. TGRL outperforms strong baselines across diverse domains without hyperparameter tuning.

Submission Number: 5504

Loading