Learning Complementary Representations of the Past using Auxiliary Tasks in Partially Observable Reinforcement Learning
Abstract: Partially observable Markov decision processes (POMDPs) define discrete-time sequential control problems [3, 11, 20]. In partially observable reinforcement learning (RL), an agent lacks access to the system state or domain model, and has to rely on the observable past (aka history-state) for decision making [20]. History-states are intrinsically complex, and extracting more appropriate representations is very challenging albeit necessary for general POMDPs. We refer to this as the history representation learning problem.
0 Replies
Loading