Dynamic Preference Calibration: Meta-Learning Soft Labels for Robust Alignment

Mengyang Li; Pinlong Zhao

Dynamic Preference Calibration: Meta-Learning Soft Labels for Robust Alignment

Mengyang Li, Pinlong Zhao

19 Sept 2025 (modified: 01 Dec 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Models, Direct Preference Optimization

Abstract: Noise in preference data significantly impedes the robust alignment of large language models (LLMs) with human values. Existing methods that rely on global noise assumptions or static pre-processing heuristics are often insufficient, as they fail to address the instance-specific and dynamic nature of preference noise. To overcome these limitations, we introduce Dynamic Preference Calibration, a novel framework that meta-learns to generate adaptive soft labels directly from noisy data. Our approach employs a lightweight meta-learner that maps a perplexity difference (PPLDiff) signal to a calibrated soft label. Crucially, the power of our dynamic approach stems from calculating this PPLDiff signal online, using the main, evolving LLM itself. This creates a symbiotic loop where the main model's improving understanding continuously informs and refines the calibration strategy, allowing it to co-evolve. Guided by a small, clean meta-dataset, the meta-learner is optimized to produce labels that maximize alignment performance. Extensive experiments on benchmark datasets demonstrate that our method establishes a new state-of-the-art for noisy preference alignment, significantly outperforming strong baselines. It maintains high performance and stability even under extreme noise levels up to 40\% label flips, highlighting the promise of meta-learning for building fundamentally more robust and reliable alignment techniques.

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Submission Number: 14982

Loading