Mixing corrupted preferences for robust and feedback-efficient preference-based reinforcement learning

Published: 01 Jan 2025, Last Modified: 20 May 2025Knowl. Based Syst. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•Enhanced robustness against human label noise by using mixup augmentation.•Improved feedback efficiency with only limited feedback instances.•Mitigated the overconfidence in preference predictor, which has been neglected so far.
Loading