Keywords: Offline Reinforcement Learning, Retrieval-Augmented Generation, Goal-Conditioned Policy, Trajectory Stitching
Abstract: Offline reinforcement learning (RL) learns policies from fixed datasets, thereby avoiding costly or unsafe environment interactions. However, its reliance on finite static datasets inherently restricts the ability to generalize beyond the training distribution.
Prior solutions based on synthetic data augmentation often fail to generalize to unseen scenarios in the (augmented) dataset.
To address these challenges, we propose Retrieval High-quAlity Demonstrations (RAD) for decision-making, which innovatively introduces a retrieval mechanism into offline RL. Specifically, RAD retrieves high-return and reachable states from the offline dataset as target states, and leverages a generative model to generate sub-trajectories conditioned on these targets for planning. Since the targets are high-return states, once the agent reaches such a target, it can continue to obtain high returns by following the associated high-return actions, thereby improving policy generalization. Extensive experiments confirm that RAD achieves competitive or superior performance compared to baselines across diverse benchmarks, validating its effectiveness. Our code is available at https://anonymous.4open.science/r/RAD_0925_1-690E.
Primary Area: reinforcement learning
Submission Number: 3250
Loading