Self-Alignment for Offline Safe Reinforcement Learning

Seungyub Han; Hyung Jjn Kim; Jungwoo Lee

Self-Alignment for Offline Safe Reinforcement Learning

Seungyub Han, Hyung Jjn Kim, Jungwoo Lee

27 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: offline safe reinforcement learning, self alignment, prompt, lyapunov stability

TL;DR: We propose Lyapunov conditioned self alignment method for offline transformer based RL.

Abstract: Deploying an offline reinforcement learning (RL) agent into a downstream task is challenging and faces unpredictable transitions due to the distribution shift between the offline RL dataset and the real environment. To solve the distribution shift problem, some prior works aiming to learn a well-performing and safer agent have employed conservative or safe RL methods in the offline setting. However, the above methods require a process of retraining from scratch or fine-tuning to satisfy the desired criteria for performance and safety. In this work, we propose a Lyapunov conditioned self-alignment method for a transformer-based world model , which does not require retraining and conducts the test-time adaptation for the desired criteria. We show that a transformer-based world model can be described as a model-based hierarchical RL. As a result, we can combine hierarchical RL and our in-context learning for self-alignment in transformers. The proposed self-alignment framework aims to make the agent safe by self-instructing with the Lyapunov condition. In experiments, we demonstrate that our self-alignment algorithm outperforms safe RL methods in continuous control and safe RL benchmark environments in terms of return, costs, and failure rate.

Supplementary Material: zip

Primary Area: reinforcement learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 9163

Loading