Stable Planning through Aligned Representations in Model-Based Reinforcement Learning

Misagh Soltani; Forest Agostinelli

Stable Planning through Aligned Representations in Model-Based Reinforcement Learning

Misagh Soltani, Forest Agostinelli

Published: 19 Sept 2025, Last Modified: 27 Oct 2025NeurIPS 2025 Workshop EWMEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Model-Based Reinforcement Learning, Planning, Reinforcement Learning, Alignment Model, Aligned Representation Learning

TL;DR: SPAR is a framework that trains a discrete world model and heuristic function once in a clean environment, then efficiently adapts to visual transformations using only an alignment network, reducing adaptation time by 95%.

Abstract: Integrating planning with reinforcement learning (RL) significantly improves problem-solving capabilities for sequential decision-making problems, particularly in sparse-reward, long-horizon tasks. Recently, it has been shown that discrete world models can be trained such that no model degradation occurs over thousands of time steps and states can be re-identified during planning. As a result, a heuristic function can be trained with data generated from the world model, and the learned world model and heuristic function can be used with planning to solve problems. However, this approach fails to solve problems with state transformations to which the world model and heuristic function should be invariant (i.e., noise), without re-training the world model and heuristic function. In this work, we introduce Stable Planning through Aligned Representations (SPAR), an efficient framework that trains a discrete world model and heuristic function in a clean Markov decision process (MDP) and trains an alignment network to map transformed states to their discrete latent state in the clean MDP. When solving problems, we exploit the underlying discrete latent representation and round the output of the alignment network in hopes that it matches the clean latent state exactly. As a result, adapting to transformations only requires training the adaptation network while the world model and heuristic function remain fixed. We then demonstrate its effectiveness on Rubik's Cube domain, and compare it with applying a similar approach to a world model with continuous latent representations. SPAR successfully solves over 89.39% of problems with 17 different visual transformations and real-world images. This adaptation process requires no additional world model or heuristic function re-training, and reduces re-training time by at least 95%.

Submission Number: 54

Loading