Accelerating Diffusion Models in Offline RL via Reward-Aware Consistency Trajectory Distillation

Xintong Duan; Yutong He; Fahim Tajwar; Ruslan Salakhutdinov; J Zico Kolter; Jeff Schneider

Accelerating Diffusion Models in Offline RL via Reward-Aware Consistency Trajectory Distillation

Xintong Duan, Yutong He, Fahim Tajwar, Ruslan Salakhutdinov, J Zico Kolter, Jeff Schneider

Published: 23 Sept 2025, Last Modified: 23 Dec 2025SPIGM @ NeurIPSEveryoneRevisionsBibTeXCC BY 4.0

Keywords: reinfocement learning, diffusion models, trajectory based planning

TL;DR: A method that makes diffusion models dramatically faster in offline RL by distilling them into single-step consistency models that directly optimizes for rewards, achieving both better performance and significant speedups

Abstract: Although diffusion models have achieved strong results in decision-making tasks, their slow inference speed remains a key limitation. While the consistency model offers a potential solution, its applications to decision-making often struggle with suboptimal demonstrations or rely on complex concurrent training of multiple networks. In this work, we propose a novel approach to consistency distillation for offline reinforcement learning that directly incorporates reward optimization into the distillation process. Our method enables single-step generation while maintaining higher performance and simpler training. Empirical evaluations on the Gym MuJoCo benchmarks and long horizon planning demonstrate that our approach can achieve an $6.8$% improvement over previous state-of-the-art while offering up to $142\times$ speedup over diffusion counterparts in inference time.

Submission Number: 86

Loading