MCPlanner: Multi-Scale Consistency Planning for Offline Reinforcement Learning

ICLR 2026 Conference Submission12828 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Generative models, Long-horizon planning, Reinforcement learning
Abstract: Planning for long-horizon tasks is a significant challenge, often addressed with complex hierarchical methods that rely on multiple, independently trained models. These hierarchical approaches can be brittle and incur coherence issues. In this work, we introduce Multi-scale Consistency Planner ($\textbf{MCPlanner}$), a novel framework that leverages the unique properties of Generalized Consistency Trajectory Models (GCTMs) to create a fluid and unified planning hierarchy. Unlike prior generative models which are limited to mappings from noise to data, GCTMs can learn a direct, fully-traversable ODE path between arbitrary data distributions. This crucial capability allows MCPlanner to unify high-level and low-level planning within a single model. Instead of training separate high-level and low-level planners, MCPlanner employs a single GCTM trained on end-to-end expert trajectories. At inference time, a seamless hierarchy emerges: coarse, long-horizon plans are generated by querying the model at a sparse temporal resolution, while dense, fine-grained motions are synthesized by querying the same model on the continuous path between these coarse waypoints. Our approach obviates the need for discrete hierarchical structures, offering a more elegant, efficient, and controllable solution to long-horizon planning. Furthermore, our experimental results demonstrate that MCPlanner achieves state-of-the-art performance across $35$ challenging tasks on OGBench benchmark, by consistently outperforming prior approaches.
Primary Area: reinforcement learning
Submission Number: 12828
Loading