Non-Markovian Predictive Coding For Planning In Latent Space

Tung Nguyen; Rui Shu; Tuan Pham; Hung Bui; Stefano Ermon

Non-Markovian Predictive Coding For Planning In Latent Space

Tung Nguyen, Rui Shu, Tuan Pham, Hung Bui, Stefano Ermon

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Blind SubmissionReaders: Everyone

Keywords: representation learning, reinforcement learning, information theory

Abstract: High-dimensional observations are a major challenge in the application of model-based reinforcement learning (MBRL) to real-world environments. In order to handle high-dimensional sensory inputs, existing MBRL approaches use representation learning to map high-dimensional observations into a lower-dimensional latent space that is more amenable to dynamics estimation and planning. Crucially, the task-relevance and predictability of the learned representations play critical roles in the success of planning in latent space. In this work, we present Non-Markovian Predictive Coding (NMPC), an information-theoretic approach for planning from high-dimensional observations with two key properties: 1) it formulates a mutual information objective that prioritizes the encoding of task-relevant components of the environment; and 2) it employs a recurrent neural network capable of modeling non-Markovian latent dynamics. To demonstrate NMPC’s ability to prioritize task-relevant information, we evaluate our new model on a challenging modification of standard DMControl tasks where the DMControl background is replaced with natural videos, containing complex but irrelevant information to the planning task. Our experiments show that NMPC is superior to existing methods in the challenging complex-background setting while remaining competitive with current state-of-the-art MBRL models in the standard setting.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

One-sentence Summary: We develop a contrastive learning alternative to Dreamer that works well in both the standard and non-standard setting (dominated by task-irrelevant information).

Reviewed Version (pdf): https://openreview.net/references/pdf?id=KBeCe8e9EW

13 Replies

Loading