- Keywords: variational Bayes, oracle guiding, reinforcement learning, decision making, probabilistic modeling, game, Mahjong
- Abstract: How to make intelligent decisions is a central problem in machine learning and cognitive science. Despite recent successes of deep reinforcement learning (RL) in various decision making problems, an important but under-explored aspect is how to leverage oracle observation (the information that is invisible during online decision making, but is available during offline training) to facilitate learning. For example, human experts will look at the replay after a Poker game, in which they can check the opponents' hands to improve their estimation of the opponents' hands from the visible information during playing. In this work, we study such problems based on Bayesian theory and derive an objective to leverage oracle observation in RL using variational method. Our key contribution is to propose a general learning framework referred to as variational latent oracle guiding (VLOG) for deep RL. VLOG is featured with preferable properties such as its robust and promising performance and its versatility to incorporate with any value-based deep RL algorithm. We empirically demonstrate the effectiveness of VLOG in online and offline RL domains using decision-making tasks ranged from video games to a challenging tile-based game Mahjong. Furthermore, we publish the environment of Mahjong and the corresponding offline RL dataset as a benchmark to facilitate future research on oracle guiding.
- One-sentence Summary: We propose a variational Bayes framework leveraging oracle (hindsight) information available in training to improve deep reinforcement learning.
- Supplementary Material: zip