Keywords: Offline RL, Diffusion model, Sparse reward, Long horizon
TL;DR: DIAR integrates diffusion models with offline RL via adaptive decision horizon adjustment, demonstrating improved long-term planning and competitive performance on D4RL benchmarks.
Abstract: We introduce DIAR (Diffusion-model-guided Implicit Q-learning with Adaptive Revaluation), a novel offline reinforcement learning approach that tackles learning from fixed datasets and making effective long-horizon decisions. DIAR leverages diffusion models to learn state-action sequence distributions for balanced decision-making combined with value functions. The key innovation is the Adaptive Revaluation mechanism, which dynamically adjusts decision lengths by comparing current and future state values, enhancing long-term flexibility and trajectory selection. DIAR enables precise Q-function learning through diffusion-guided value functions and generates diverse latent trajectories for improved policy robustness. We evaluate DIAR on D4RL benchmarks, demonstrating competitive performance and consistent improvements over existing offline RL methods.
Submission Number: 6
Loading