Waypoint Transformer: Reinforcement Learning via Supervised Learning with Intermediate Targets

Published: 21 Sept 2023, Last Modified: 02 Nov 2023NeurIPS 2023 posterEveryoneRevisionsBibTeX
Keywords: offline reinforcement learning, reinforcement learning via supervised learning, behavioral cloning
TL;DR: Using intermediate target goals and rewards as conditioning variables with only behavioral cloning objectives and minimal hyperparameter tuning, we can achieve state-of-the-art performance in offline reinforcement learning.
Abstract: Despite the recent advancements in offline reinforcement learning via supervised learning (RvS) and the success of the decision transformer (DT) architecture in various domains, DTs have fallen short in several challenging benchmarks. The root cause of this underperformance lies in their inability to seamlessly connect segments of suboptimal trajectories. To overcome this limitation, we present a novel approach to enhance RvS methods by integrating intermediate targets. We introduce the Waypoint Transformer (WT), using an architecture that builds upon the DT framework and conditioned on automatically-generated waypoints. The results show a significant increase in the final return compared to existing RvS methods, with performance on par or greater than existing state-of-the-art temporal difference learning-based methods. Additionally, the performance and stability improvements are largest in the most challenging environments and data configurations, including AntMaze Large Play/Diverse and Kitchen Mixed/Partial.
Supplementary Material: zip
Submission Number: 5666
Loading