Keywords: offline reinforcement learning, sample efficiency
Abstract: Offline reinforcement learning (RL) has achieved notable progress in recent years. It enables learning optimized policy from fixed offline datasets and, therefore is particularly suitable for decision-making tasks that lack reliable simulators or have environment interaction restrictions. However, existing offline RL methods typically need a large amount of training data to achieve reasonable performance, and offer limited generalizability in out-of-distribution (OOD) regions due to conservative data-related regularizations. This seriously hinders the usability of offline RL in solving many real-world applications, where the available data are often limited.
In this study, we introduce a highly sample-efficient offline RL algorithm that learns optimized policy by enabling state-stitching in a compact latent space regulated by the fundamental symmetry in dynamical systems. Specifically, we introduce a time-reversal symmetry (T-symmetry) enforced inverse dynamics model (TS-IDM) to derive well-regulated latent state representations that greatly ease the difficulty of OOD generalization. Within the learned latent space, we can learn a guide-policy to output the latent next state that maximizes the reward, bypassing the conservative action-level behavior constraints as used in typical offline RL algorithms. The final optimized action can then be easily extracted by using the guide-policy's output as the goal state in the learned TS-IDM.
We call our method Offline RL via T-symmetry Enforced Latent State-Stitching (TELS).
Our approach achieves amazing sample efficiency and OOD generalizability, significantly outperforming existing offline RL methods in a wide range of challenging small-sample tasks, even using as few as 1\% of the original data in D4RL tasks.
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 10618
Loading