Model-Based Reinforcement Learning via Latent-Space CollocationDownload PDF

28 Sept 2020 (modified: 22 Oct 2023)ICLR 2021 Conference Blind SubmissionReaders: Everyone
Keywords: visual model-based reinforcement learning, visual planning, long-horizon planning, collocation
Abstract: The ability to construct and execute long-term plans enables intelligent agents to solve complex multi-step tasks and prevents myopic behavior only seeking the short-term reward. Recent work has achieved significant progress on building agents that can predict and plan from raw visual observations. However, existing visual planning methods still require a densely shaped reward that provides the algorithm with a short-term signal that is always easy to optimize. These algorithms fail when the shaped reward is not available as they use simplistic planning methods such as sampling-based random shooting and are unable to plan for a distant goal. Instead, to achieve long-horizon visual control, we propose to use collocation-based planning, a powerful optimal control technique that plans forward a sequence of states while constraining the transitions to be physical. We propose a planning algorithm that adapts collocation to visual planning by leveraging probabilistic latent variable models. A model-based reinforcement learning agent equipped with our planning algorithm significantly outperforms prior model-based agents on challenging visual control tasks with sparse rewards and long-term goals.
One-sentence Summary: We propose a visual model-based reinforcement agent that uses collocation in the latent space to plan and outperforms prior shooting-based planning methods.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Supplementary Material: zip
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/arxiv:2106.13229/code)
Reviewed Version (pdf): https://openreview.net/references/pdf?id=wCtJtcLkgx
20 Replies

Loading