Keywords: offline safe reinforcement learning, self alignment, prompt, lyapunov stability
TL;DR: We propose Lyapunov conditioned self alignment method for offline transformer based RL.
Abstract: Deploying an offline reinforcement learning (RL) agent into a downstream task is challenging and faces unpredictable transitions due to the distribution shift between the offline RL dataset and the real environment. To solve the distribution shift problem, some prior works aiming to learn a well-performing and safer agent have employed conservative or safe RL methods in the offline setting. However, the above methods require a process of retraining from scratch or fine-tuning to satisfy the desired criteria for performance and safety. In this work, we propose a Lyapunov conditioned self-alignment method for a transformer-based world model , which does not require retraining and conducts the test-time adaptation for the desired criteria. We show that a transformer-based world model can be described as a model-based hierarchical RL. As a result, we can combine hierarchical RL and our in-context learning for self-alignment in transformers. The proposed self-alignment framework aims to make the agent safe by self-instructing with the Lyapunov condition. In experiments, we demonstrate that our self-alignment algorithm outperforms safe RL methods in continuous control and safe RL benchmark environments in terms of return, costs, and failure rate.
Supplementary Material: zip
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 9163
Loading