Improving Generalization with Approximate Factored Value FunctionsDownload PDF

Published: 25 Mar 2022, Last Modified: 05 May 2023ICLR2022 OSC PosterReaders: Everyone
Keywords: Generalization, Factorization, Factored Reward MDP, MDP
Abstract: Reinforcement learning in general unstructured MDPs presents a challenging learning problem. However, certain kinds of MDP structures, such as factorization, are known to make the problem simpler. This fact is often not useful in more complex tasks because complex MDPs with high-dimensional state spaces do not often exhibit such structure, and even if they do, the structure itself is typically unknown. In this work, we instead turn this observation on its head: instead of developing algorithms for structured MDPs, we propose a representation learning algorithm that approximates an unstructured MDP with one that has factorized structure. We then use these factors as a more convenient state representation for downstream learning. The particular structure that we leverage is reward factorization, which defines a more compact class of MDPs that admit factorized value functions. We show that our proposed approach, \textbf{A}pproximately \textbf{Fa}ctored \textbf{R}epresentations (AFaR), can be easily combined with existing RL algorithms, leading to faster training (better sample complexity) and robust zero-shot transfer (better generalization) on the Procgen benchmark. An interesting future work would be to extend AFaR to learn~\textit{factorized} policies that can act on the individual factors that may lead to benefits like better exploration. We empirically verify the effectiveness of our approach in terms of better sample complexity and improved generalization on the ProcGen benchmark and the MiniGrid environments.
3 Replies