Accelerating Diffusion Planners in Offline RL via Reward-Aware Consistency Trajectory Distillation

Published: 23 Sept 2025, Last Modified: 01 Dec 2025ARLETEveryoneRevisionsBibTeXCC BY 4.0
Track: Research Track
Keywords: reinforcement learning, diffusion models, trajectory based planning
Abstract: Although diffusion models have achieved strong results in decision-making tasks, their slow inference speed remains a key limitation. While consistency models offer a potential solution, existing applications to decision-making either struggle with suboptimal demonstrations under behavior cloning or rely on complex concurrent training of multiple networks under the actor-critic framework. In this work, we propose a novel approach to consistency distillation for offline reinforcement learning that directly incorporates reward optimization into the distillation process. Our method achieves single-step diffusion sampling while generating higher-reward action trajectories through decoupled training and noise-free reward guidance. Empirical evaluations on the Gym MuJoCo, FrankaKitchen, and long horizon planning benchmarks demonstrate that our approach can achieve a $9.7$% improvement over previous state-of-the-art while offering up to $142\times$ speedup over diffusion counterparts in inference time.
Submission Number: 54
Loading