CUDA: Capturing Uncertainty and Diversity in Preference Feedback Augmentation

Published: 10 Jun 2025, Last Modified: 30 Jun 2025MoFA OralEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Reinforcement Learning, Preference-based RL, Data Augmentation
Abstract: Preference-based Reinforcement Learning (PbRL) effectively addresses reward design challenges in RL and facilitates human-AI alignment by enabling agents to learn human intentions. However, optimizing PbRL critically depends on abundant, diverse, and accurate human feedback, which is costly and time-consuming to acquire. While existing feedback augmentation methods aim to leverage sparse human preferences, they often neglect diversity, primarily generating feedback for trajectory pairs with extreme differences based on high confidence. This limitation restricts the diversity of augmented dataset, leading to an incomplete representation of human preferences. To overcome this, we introduce Capturing Uncertainty and Diversity in preference feedback Augmentation (CUDA), a novel approach that comprehensively considers both uncertainty and diversity. CUDA enhances augmentation by employing ensemble-based uncertainty estimation for filtering and extracting feedback from diverse clusters via bucket-based categorization. These two mechanisms enable CUDA to obtain diverse and accurate augmented feedback. We evaluate CUDA on MetaWorld and DMControl offline datasets, demonstrating significant performance improvements over various offline PbRL algorithms and existing augmentation methods across diverse scenarios.
Submission Number: 61
Loading