Value-Evolutionary-Based Reinforcement Learning
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Evolutionary Algorithms, Reinforcement Learning
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: Combining Evolutionary Algorithms (EAs) and Reinforcement Learning (RL) for policy search has been proven to improve RL performance. However, prior research largely overlooks value-based RL in favor of merging EAs with policy-based RL. This paper introduces Value-Evolutionary-Based Reinforcement Learning (VEB-RL), a framework that combines EAs and RL for policy search with a focus on value-based RL. The framework maintains a population of value functions instead of policies and leverages negative Temporal Difference (TD) error as the fitness metric for evolution. The metric is more sample efficient for population evaluation than cumulative rewards of simulation and is closely associated with the accuracy of the value function approximation. In addition, VEB-RL enables elites of the population to interact with the environment to offer high-quality samples for RL optimization, while the RL value function participates in the population's evolution in each generation. Experiments on MinAtar and Atari demonstrate the effectiveness of VEB-RL in significantly improving DQN, Rainbow, and SPR.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 8847
Loading