V-Former: Offline RL with Temporally-Extended Actions

23 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: reinforcement learning, robotics
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: In this paper, we propose an offline reinforcement learning (RL) method that learns to take temporally extended actions, can handle narrow data distributions such as those produced by mixtures of multi-task demonstrations, and can train on data with different control frequencies. This combination of properties makes our proposed method especially well-suited for robotic offline RL, where datasets might consist of (narrow) demonstration data mixed with (broader) suboptimal data, and control frequencies can present a particularly significant challenge. We derive our method starting from a continuous time formulation of RL, and show that offline RL with temporally extended “action chunks” can be performed efficiently by extending the implicit Q-learning (IQL) approach, in combination with expressive Transformer-based policies for representing temporally extended open-loop action sequences. Our experiments show that our method both improves over prior approaches on simulated robotic demonstration data and outperforms prior works that aim to learn from data at multiple frequencies.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 6695
Loading