Inhibiting Error Exacerbation in Offline Reinforcement Learning With Data Sparsity

Fan Zhang, Malu Zhang, Wenyu Chen, Siying Wang, Xin Zhang, Jiayin Li, Yang Yang

Published: 01 Mar 2026, Last Modified: 12 Mar 2026IEEE Transactions on Neural Networks and Learning SystemsEveryoneRevisionsCC BY-SA 4.0
Abstract: Offline reinforcement learning (RL) aims to learn effective agents from previously collected datasets, facilitating the safety and efficiency of RL by avoiding real-time interaction. However, in practical applications, the approximation error of the out-of-distribution (OOD) state–actions can cause considerable overestimation due to error exacerbation during training, finally degrading the performance. In contrast to prior works that merely addressed the OOD state–actions, we discover that all data introduces estimation error whose magnitude is directly related to data sparsity. Consequently, the impact of data sparsity is inevitable and vital when inhibiting the error exacerbation. In this article, we propose an offline RL approach to inhibit error exacerbation with data sparsity (IEEDS), which includes a novel value estimation method to consider the impact of data sparsity on the training of agents. Specifically, the value estimation phase includes two innovations: 1) replace Q-net with V-net, a smaller and denser state space makes data more concentrated, contributing to more accurate value estimation and 2) introduce state sparsity to the training by design state-aware-sparsity Markov decision process (MDP), further lessening the impact of sparse states. We theoretically prove the convergence of IEEDS under state-aware-sparsity MDP. Extensive experiments on offline RL benchmarks reveal that IEEDS’s superior performance.
Loading