Keywords: Offline Reinforcement Learning, Dataset Quality, Learning Performance Prediction, Convolutional Neural Netowrks
Abstract: In this paper, we address the challenge of predicting learning performance in offline Reinforcement Learning (RL). It is a crucial task to ensure the learned policy performs reliably in the real world and to avoid unsafe or costly interactions. We introduce a new approach that utilizes Convolutional Neural Networks (CNNs) to analyze offline RL datasets, represented as images. Our model predicts the performance of policies learned from these datasets within a specific RL framework, including the selected algorithm and hyperparameters. We explore the model's transferability across different scenarios with alterations in state space size or transition functions. Furthermore, we demonstrate an application of our model in optimizing offline RL datasets. Leveraging genetic algorithms, we navigate through potential dataset subsets to identify a reduced version that enhances policy learning efficiency. This optimized dataset reduces training time while achieving comparable or superior performance to the complete dataset.
Submission Number: 23
Loading