Visualizing MuZero ModelsDownload PDF

Published: 22 Jul 2021, Last Modified: 22 Oct 2023URL 2021 PosterReaders: Everyone
Keywords: Model-based reinforcement learning, value equivalent models, representation learning, latent models
TL;DR: Investigating Value-Equivalent model representation learning in MuZero complemented with unsupervised regularization.
Abstract: MuZero, a model-based reinforcement learning algorithm that uses a value equivalent dynamics model, achieved state-of-the-art performance in Chess, Shogi and the game of Go. In contrast to standard forward dynamics models that predict a full next state, value equivalent models are trained to predict a future value, thereby emphasizing value relevant information in the representations. While value equivalent models have shown strong empirical success, there is no research yet that visualizes and investigates what types of representations these models actually learn. Therefore, in this paper we visualize the latent representation of MuZero agents. We find that action trajectories may diverge between observation embeddings and internal state transition dynamics, which could lead to instability during planning. Based on this insight, we propose two regularization techniques to stabilize MuZero's performance. Additionally, we provide an open-source implementation of MuZero along with an interactive visualizer of learned representations, which may aid further investigation of value equivalent algorithms.
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/arxiv:2102.12924/code)
1 Reply

Loading