Conserving Fitness Evaluation in Evolutionary Algorithms with Reinforcement Learning

ICLR 2026 Conference Submission15155 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Reinforcement Learning, Evolutionary Algorithms, Robotics
TL;DR: Using Reinforcement Learning to create children in each generation instead of random creation.
Abstract: Evolutionary Algorithms (EAs) have been successfully used for many applications, but the randomness in application of mutation and recombination operators implies that a large number of offspring are of relatively low fitness, and those of high fitness substantially resemble already visited candidate solutions; this is manifested in very little improvement in solution quality after the early generations of execution of an EA. We address this issue in EAs by the proposed ``Evolutionary Algorithm using Reinforcement Learning (EARL)`` that improves efficiency by imposing constraints on the generated offspring. The proposed approach integrates an actor-critic reinforcement learning agent into the EA pipeline. The agent is trained with a rich state representation that captures both local parental information and global population statistics. A carefully designed multi-component reward function balances four objectives: improving fitness over parents, achieving high population rank, maintaining diversity and optimizing fitness. The agent is optimized with ``simple policy optimization,'' a recent RL algorithm, ensuring both learning stability and exploration. Experimental evaluations on four benchmark problems show that EARL achieves faster convergence, superior best fitness values, and greater population diversity compared to a standard EA. We also evaluate EARL on the real-world application of adversarial object generation for robotic grasp training. Our results demonstrate that EARL can transform evolutionary search into a more directed and efficient optimization approach, with practical implications for domains where fitness evaluation is expensive.
Supplementary Material: zip
Primary Area: reinforcement learning
Submission Number: 15155
Loading