Pushing the Limit of Sample-Efficient Offline Reinforcement Learning

Peng Cheng; Zhihao Wu; Jianxiong Li; Ziteng He; Haoran Xu; Wei Sun; Youfang Lin; Xianyuan Zhan

Pushing the Limit of Sample-Efficient Offline Reinforcement Learning

Peng Cheng, Zhihao Wu, Jianxiong Li, Ziteng He, Haoran Xu, Wei Sun, Youfang Lin, Xianyuan Zhan

Published: 06 Mar 2025, Last Modified: 15 Apr 2025ICLR 2025 Workshop World ModelsEveryoneRevisionsBibTeXCC BY 4.0

Keywords: sample efficiency, representation learning, fundamental symmetry for dynamic modeling

TL;DR: we propose a highly sample-efficient offline RL algorithm that achieves amazing OOD generalizability, significantly outperforming existing offline RL methods in a wide range of challenging small-sample tasks.

Abstract: Offline reinforcement learning (RL) has achieved significant progress in recent years. However, most existing offline RL methods require a large amount of training data to achieve reasonable performance and offer limited generalizability in out-of-distribution (OOD) regions due to conservative data-related regularizations. This seriously hinders the usability of offline RL in solving many real-world applications, where the available data are often limited. In this study, we introduce a highly sample-efficient offline RL algorithm that enables state-stitching in a compact latent space regulated by the fundamental time-reversal symmetry (T-symmetry) of dynamical systems. Specifically, we introduce a T-symmetry enforced inverse dynamics model (TS-IDM) to derive well-regulated latent state representations that greatly facilitate OOD generalization. A guide-policy can then be learned entirely in the latent space to output the next state that maximizes the reward, bypassing the conservative action-level behavior constraints as adopted in most offline RL methods. Finally, the optimized action can be easily extracted by using the guide-policy's output as the goal state in the learned TS-IDM. We call our method Offline RL via **T**-symmetry **E**nforced **L**atent **S**tate-Stitching (**TELS**). Our approach achieves amazing sample efficiency and OOD generalizability, significantly outperforming existing offline RL methods in a wide range of challenging small-sample tasks, even using as few as 1\% of the data samples in D4RL datasets.

Submission Number: 53

Loading