Practical Diffusion Planning via Temperature-Guided Reward Conditioning

20 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: offline reinforcement learning, diffusion planning, generative models
TL;DR: Temperature-Guided Diffusion Planning is a guidance approach for diffusion planning that overcomes the per-task hyperparameter optimization of classifier-free guidance.
Abstract: Diffusion planners address sequential decision-making by framing plan generation as a generative modeling task over trajectories, mitigating compounding errors and myopic predictions typical of autoregressive methods. They sample long-horizon, globally consistent plans in a single pass, enabling parallel refinement and robust handling of multimodal futures. Reward conditioning is typically achieved through classifier guidance or classifier-free guidance (CFG), with CFG favored for its performance and flexibility but requiring extensive, task-specific hyperparameter tuning that limits scalability and generalization. Our analysis reveals that guidance performance hinges on careful adaptation to the data manifold and reward distribution, contributing to CFG's hyperparameter fragility. In this work, we propose the temperature-guided diffusion planner (TGDP), which adapts CFG for reward conditioning by self-calibrating to these task-specific characteristics. TGDP leverages temperature-based sample reweighting during training and adaptive guidance scaling at inference, yielding robust high-reward plan generation without per-task hyperparameter optimization. Across standard reward-driven benchmarks, TGDP matches performance of prior methods while maintaining a single set of default hyperparameters, establishing a practical, scalable, and generalizable approach to diffusion-based planning.
Primary Area: reinforcement learning
Submission Number: 24685
Loading