One Goal, Many Challenges: Robust Preference Optimization Amid Content-Aware and Multi-Source Noise

Published: 06 Mar 2025, Last Modified: 15 Mar 2025SCSL @ ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0
Track: regular paper (up to 6 pages)
Keywords: Reinforcement Learning from Human Feedback, Preference Alignment, Backdoor Attack, Robust Preference Optimization
TL;DR: This paper presents Content-Aware Noise-Resilient Preference Optimization (CNRPO), a novel framework that mitigates content-dependent noises in preference learning through multi-objective optimization.
Abstract: Large Language Models (LLMs) have made significant strides in generating human-like responses, largely due to preference alignment techniques. However, these methods often assume unbiased human feedback, which is rarely the case in real-world scenarios. This paper introduces Content-Aware Noise-Resilient Preference Optimization (CNRPO), a novel framework that addresses multiple sources of content-dependent noise in preference learning. CNRPO employs a multi-objective optimization approach to separate true preferences from content-aware noises, effectively mitigating their impact. We leverage backdoor attack mechanisms to efficiently learn and control various noise sources within a single model. Theoretical analysis and extensive experiments on different synthetic noisy datasets demonstrate that CNRPO significantly improves alignment with primary human preferences while controlling for secondary noises and biases, such as response length and harmfulness.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Presenter: ~Amirabbas_Afzali2
Format: Yes, the presenting author will definitely attend in person because they attending ICLR for other complementary reasons.
Funding: Yes, the presenting author of this submission falls under ICLR’s funding aims, and funding would significantly impact their ability to attend the workshop in person.
Submission Number: 28
Loading