Generating Diverse Critics for Conditioned Policy Distillation

Published: 01 Jan 2024, Last Modified: 02 Oct 2024GECCO Companion 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Quality Diversity Reinforcement Learning (QD-RL) allows for the creation of a large set of behaviorally diverse yet high performing policies. However, these techniques often require the maintenance of a large set of parameters for individual neural network controllers, which have lots of redundancy between them. In this work, we outline and test an approach that uses quality diversity optimization to produce a large archive of small Q-functions that can be used as critics to train a larger, task conditioned, policy. We verify the feasibility of this approach in a maze navigation domain, and find that this method is a promising technique for conditioned policy production.
Loading