Rethinking Decision Transformer via Hierarchical Reinforcement Learning

23 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: offline reinforcement learning, decision transformer
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: Decision Transformer (DT) is an innovative algorithm leveraging recent advances of the Transformer architecture in sequential decision making. However, a notable limitation of DT is its reliance on {recalling} trajectories from datasets, without the capability to seamlessly stitch them together. In this work, we introduce a general sequence modeling framework for studying sequential decision making through the lens of \emph{Hierarchical Reinforcement Learning}. At the time of making decisions, a \emph{high-level} policy first proposes an ideal \emph{prompt} for the current state, a \emph{low-level} policy subsequently generates an action conditioned on the given prompt. We show how DT emerges as a special case with specific choices of high-level and low-level policies and discuss why these choices might fail in practice. Inspired by these observations, we investigate how to jointly optimize the high-level and low-level policies to enable the stitching capability. This further leads to the development of new algorithms for offline reinforcement learning. Finally, our empirical studies clearly demonstrate the proposed algorithms significantly surpass DT on several control and navigation benchmarks. We hope that our contributions can inspire the integration of Transformer architectures within the field of RL.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 7223
Loading