Keywords: Sequential RL, S4, Decision transformers
TL;DR: Replacing transformers with state-space layers for RL modeling. Also extended to on-policy training.
Abstract: Recently, sequence learning methods have been applied to the problem of off-policy
Reinforcement Learning, including the seminal work on Decision Transformers,
which employs transformers for this task. Since transformers are parameter-heavy,
cannot benefit from history longer than a fixed window size, and are not computed
using recurrence, we set out to investigate the suitability of the S4 family of
models, which are based on state-space layers and have been shown to outperform
transformers, especially in modeling long-range dependencies. In this work, we
present two main algorithms: (i) an off-policy training procedure that works with
trajectories, while still maintaining the training efficiency of the S4 model. (ii) An
on-policy training procedure that is trained in a recurrent manner, benefits from
long-range dependencies, and is based on a novel stable actor-critic mechanism.
Our results indicate that our method outperforms multiple variants of decision
transformers, as well as the other baseline methods on most tasks, while reducing
the latency, number of parameters, and training time by several orders of magnitude,
making our approach more suitable for real-world RL
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Supplementary Material: zip
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Reinforcement Learning (eg, decision and control, planning, hierarchical RL, robotics)
17 Replies
Loading