QUAD: Q-Gradient Uncertainty-Aware Guidance for Diffusion policies in Offline Reinforcement Learning

QUAD: Q-Gradient Uncertainty-Aware Guidance for Diffusion policies in Offline Reinforcement Learning

ICLR 2026 Conference Submission17540 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Offline RL, Diffusion Policy

TL;DR: We introduce QUAD, an uncertainty-aware Q-gradient guidance for diffusion-based offline RL. By down-weighting unreliable gradients, QUAD achieves state-of-the-art performance on D4RL benchmarks.

Abstract: Diffusion-based offline reinforcement learning (RL) leverages Q-gradients of noisy actions to guide the denoising process. Existing approaches fall into two categories: (i) backpropagating the Q-gradient of the final denoised action through all steps, or (ii) directly estimating the Q-gradient of noisy actions. The former suffers from exploding or vanishing gradients as the number of denoising steps increases, while the latter becomes inaccurate when noisy actions deviate substantially from the dataset. In this work, we focus on addressing the limitations of the second category. We introduce QUAD, an uncertainty-aware Q-gradient guidance method. QUAD employs a Q-ensemble to estimate the uncertainty of Q-gradients and uses this uncertainty to constrain unreliable guidance during denoising. By down-weighting unreliable gradients, QUAD reduces the risk of producing suboptimal actions. Experiments on the D4RL benchmark show that QUAD outperforms state-of-the-art methods across most tasks.

Primary Area: reinforcement learning

Submission Number: 17540

Loading