Quality-Aware Experience Exploitation in Model-Based Reinforcement Learning

Guang Yang, Jiahe Li, Ziye Geng, Changqing Luo

Published: 2024, Last Modified: 15 May 2025IEEE Big Data 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: In model-based reinforcement learning (MBRL), the quality of simulated experiences is a critical bottleneck to effective policy learning. Existing research has primarily focused on reducing the generation errors of these simulated experiences but has largely ignored how the varying quality of these experiences impacts policy learning during their exploitation. To bridge this gap, we propose a novel quality-aware experience exploitation scheme, called QA2E, which dynamically exploits simulated experiences based on their assessed quality to enhance the effectiveness of model-based policy learning. Particularly, we develop a weighted Bellman backup approach to dynamically adjust the influence of simulated experiences on policy learning based on their assessed quality. Since directly measuring the quality is impractical, QA2E estimates it through the epistemic uncertainty derived from the prediction results of an ensemble of transition models. Experimental results demonstrate that QA2E significantly improves policy learning performance by more effectively exploiting simulated experiences.