Synthetic Data is Sufficient for Zero-Shot Visual Generalization from Offline Data

Ahmet H. Güzel; Jack Parker-Holder; Ilija Bogunovic

Synthetic Data is Sufficient for Zero-Shot Visual Generalization from Offline Data

Ahmet H. Güzel, Jack Parker-Holder, Ilija Bogunovic

27 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Offline Reinforcement Learning, Generalization, Data Augmentation, Synthetic Data Generation

TL;DR: We propose a practical two-step approach that combines data augmentation and synthetic data generation to address generalization challenges in vision-based offline reinforcement learning.

Abstract: Offline reinforcement learning (RL) offers a promising framework for training agents using pre-collected datasets without the need for further environment interaction. However, policies trained on offline data often struggle to generalise due to limited exposure to diverse states. The complexity of visual data introduces additional challenges such as noise, distractions, and spurious correlations, which can misguide the policy and increase the risk of overfitting if the training data is not sufficiently diverse. Indeed, this makes it challenging to leverage vision-based offline data in training robust agents that can generalize to unseen environments. To solve this problem, we propose a simple approach—generating additional synthetic data. We propose a two-step process, first $augmenting$ the originally collected offline data to improve zero-shot generalization by introducing diversity, then using a diffusion model to $generate$ additional data in latent space. We test our method across both continuous action spaces (Visual D4RL) and discrete action spaces (Procgen), demonstrating that it significantly improves generalization without requiring any algorithmic changes to existing model-free offline RL methods. We show that our method not only increases the diversity of the training data but also significantly reduces the generalization gap at test time while maintaining computational efficiency. We believe this approach could fuel additional progress in generating synthetic data to train more general agents in the future.

Supplementary Material: pdf

Primary Area: reinforcement learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 10462

Loading