Quality Diversity through Human Feedback

Published: 28 Oct 2023, Last Modified: 14 Dec 2023ALOE 2023 SpotlightEveryoneRevisionsBibTeX
Keywords: Quality diversity, human feedback, constrastive learning, reinforcement learning, image generation, robotics
TL;DR: This paper introduces Quality Diversity through Human Feedback (QDHF), which employs human feedback for deriving diversity metrics, expanding the applicability of QD algorithms.
Abstract: Reinforcement Learning from Human Feedback (RLHF) has shown potential in qualitative tasks where clear objectives are lacking. However, its effectiveness is not fully realized when it is conceptualized merely as a tool to optimize average human preferences, especially in generative tasks that demand diverse model responses. Meanwhile, Quality Diversity (QD) algorithms excel at identifying diverse and high-quality solutions but often rely on manually crafted diversity metrics. This paper introduces Quality Diversity through Human Feedback (QDHF), a novel approach integrating human feedback into the QD framework. QDHF infers diversity metrics from human judgments of similarity among solutions, thereby enhancing the applicability and effectiveness of QD algorithms. Our empirical studies show that QDHF significantly outperforms state-of-the-art methods in automatic diversity discovery and matches the efficacy of using manually crafted metrics for QD on standard benchmarks in robotics and reinforcement learning. Notably, in a latent space illumination task, QDHF substantially enhances the diversity in images generated by a diffusion model and was more favorably received in user studies. We conclude by analyzing QDHF's scalability and the quality of its derived diversity metrics, emphasizing its potential to improve exploration and diversity in complex, open-ended optimization tasks. Source code is available on GitHub: https://github.com/ld-ing/qdhf.
Submission Number: 34
Loading