High-quality Synthetic Data is Efficient for Model-based Offline Reinforcement Learning

Qichao Zhang, Xing Fang, Kaixuan Xu, Weixin Zhao, Haoran Li, Dongbin Zhao

Published: 01 Jan 2024, Last Modified: 14 May 2025IJCNN 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Recent work has found that two types of dataset characteristics including the dataset’s coverage and data quality are critical for offline reinforcement learning (RL). To improve the policy, model-based offline RL tries to generate reliable synthetic data to expand the dataset’s coverage based on trained forward and backward dynamics models. However, the characteristic of synthetic data’s quality is ignoring, which raises a question of whether augmenting high-quality synthetic data is efficient for offline RL agents. Motivated by this, we propose a novel forward High-quality Imagination and backward Reliable Check (HIRC), which is an effective data augmentation method to generate high-quality and reliable synthetic data. Specifically, we construct a value-guided forward model to generate high-quality imaginary trajectories, and employ a backward model for reliable checking to obtain synthetic data that better match with pre-collected offline transitions. In other words, the proposed HIRC method can generate high-quality synthetic data on the premise of reliability, which can be combined with model-free offline RL methods. Experimental results on the D4RL benchmark demonstrate that high-quality synthetic data generated by HIRC boosts the performance of a base agent TD3_BC. Especially, HIRC with such a base agent achieves better scores against recent popular model-free and model-based offline RL methods.