TGRL: Teacher Guided Reinforcement Learning Algorithm for POMDPsDownload PDF

Published: 03 Mar 2023, Last Modified: 19 Apr 2023RRL 2023 SpotlightReaders: Everyone
Abstract: In many real-world problems, an agent must operate in an uncertain and partially observable environment. Due to partial information, a policy directly trained to operate from these restricted observations tends to perform poorly. In some scenarios, during training more information about the environment is available, which can be utilized to find a superior policy. Because this privileged information is unavailable at deployment, such a policy cannot be deployed. The $\textit{teacher-student}$ paradigm overcomes this challenge by using actions of privileged (or $\textit{teacher}$) policy as the target for training the deployable (or $\textit{student}$) policy operating from the restricted observation space using supervised learning. However, due to information asymmetry, it is not always feasible for the student to perfectly mimic the teacher. We provide a principled solution to this problem, wherein the student policy dynamically balances between following the teacher's guidance and utilizing reinforcement learning to solve the partially observed task directly. The proposed algorithm is evaluated on diverse domains and fares favorably against strong baselines.
Track: Technical Paper
Confirmation: I have read and agree with the workshop's policy on behalf of myself and my co-authors.
2 Replies