Mixing corrupted preferences for robust and feedback-efficient preference-based reinforcement learning
Abstract: Highlights•Enhanced robustness against human label noise by using mixup augmentation.•Improved feedback efficiency with only limited feedback instances.•Mitigated the overconfidence in preference predictor, which has been neglected so far.
Loading