Value-Guided Decision Transformer: A Unified Reinforcement Learning Framework for Online and Offline Settings

Hongling Zheng; Li Shen; Yong Luo; Deheng Ye; Shuhan Xu; Bo Du; Jialie Shen; Dacheng Tao

Value-Guided Decision Transformer: A Unified Reinforcement Learning Framework for Online and Offline Settings

Hongling Zheng, Li Shen, Yong Luo, Deheng Ye, Shuhan Xu, Bo Du, Jialie Shen, Dacheng Tao

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Offline RL, Conditional Sequence Modeling, Decision Transformer

TL;DR: In this paper, we propose Value-Guided Decision Transformer (VDT), which employs progressively optimized value functions to guide the Decision Transformer (DT) in making optimal decisions.

Abstract: The Conditional Sequence Modeling (CSM) paradigm, benefiting from the transformer's powerful distribution modeling capabilities, has demonstrated considerable promise in Reinforcement Learning (RL) tasks. However, much of the work has focused on applying CSM to single online or offline settings, with the general architecture rarely explored. Additionally, existing methods primarily focus on deterministic trajectory modeling, overlooking the randomness of state transitions and the diversity of future trajectory distributions. Fortunately, value-based methods offer a viable solution for CSM, further bridging the potential gap between offline and online RL. In this paper, we propose Value-Guided Decision Transformer (VDT), which leverages value functions to perform advantage-weighting and behavior regularization on the Decision Transformer (DT), guiding the policy toward upper-bound optimal decisions during the offline training phase. In the online tuning phase, VDT further integrates value-based policy improvement with behavior cloning under the CSM architecture through limited interaction and data collection, achieving performance improvement within minimal timesteps. The predictive capability of value functions for future returns is also incorporated into the sampling process. Our method achieves competitive performance on various standard RL benchmarks, providing a feasible solution for developing CSM architectures in general scenarios. Code is available at here.

Primary Area: Reinforcement learning (e.g., decision and control, planning, hierarchical RL, robotics)

Submission Number: 20601

Loading