Planning with Sequence Models through Iterative Energy Minimization

Hongyi Chen; Yilun Du; Yiye Chen; Joshua B. Tenenbaum; Patricio A. Vela

Planning with Sequence Models through Iterative Energy Minimization

Hongyi Chen, Yilun Du, Yiye Chen, Joshua B. Tenenbaum, Patricio A. Vela

Published: 01 Feb 2023, Last Modified: 22 Jun 2025ICLR 2023 posterReaders: Everyone

Keywords: Reinforcement Learning, Planning, Language Model, Decision Transformer

TL;DR: Planning with Transformers through the energy minimization (MCMC sampling)

Abstract: Recent works have shown that language modeling can be effectively used to train reinforcement learning (RL) policies. However, the success of applying existing language models to planning, in which we wish to obtain a trajectory of actions to reach some goal, is less straightforward. The typical autoregressive generation procedures of language models preclude sequential refinement of earlier steps, which limits the effectiveness of a predicted plan. In this paper, we suggest an approach towards integrating planning with language models based on the idea of iterative energy minimization, and illustrate how such a procedure leads to improved RL performance across different tasks. We train a masked language model to capture an implicit energy function over trajectories of actions, and formulate planning as finding a trajectory of actions with minimum energy. We illustrate how this procedure enables improved performance over recent approaches across BabyAI and Atari environments. We further demonstrate unique benefits of our iterative optimization procedure, involving new task generalization, test-time constraints adaptation, and the ability to compose plans together. Project webpage: https://hychen-naza.github.io/projects/LEAP/index.html

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Reinforcement Learning (eg, decision and control, planning, hierarchical RL, robotics)

Supplementary Material: zip

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 9 code implementations](https://www.catalyzex.com/paper/planning-with-sequence-models-through/code)

33 Replies

Loading