Abstract: Noisy Preferences (NPs) present a significant challenge in aligning Large Language Models (LLMs), as incorrect preference labels can substantially degrade alignment quality. However, existing strategies to mitigate NPs often face two key limitations: (1) applying global-level adjustments that result in imprecise instance-level noise handling, and (2) relying on heuristic rules that limit the capacity to adaptively optimize alignment tasks. In response to these challenges, this paper proposes Meta-Align, a novel framework designed to address the aforementioned limitations. Meta‑Align pioneers a perplexity‑aware meta‑learning strategy for adaptive sample reweighting, with Perplexity Difference (PPLDiff) serving as a fine‑grained, instance‑level signal. Unlike traditional methods employing static rules, Meta-Align trains an adaptive weighting function via meta-learning. This function dynamically assigns sample weights based on their PPLDiff, guided by performance on a small, clean meta-dataset. Such a design enables precise instance-level noise modulation while optimizing the weighting strategy in an adaptive manner. Comprehensive experiments on benchmark datasets demonstrate that Meta-Align substantially outperforms state-of-the-art robust alignment methods, effectively down-weighting potentially noisy preferences while emphasizing reliable ones.
Paper Type: Long
Research Area: Machine Learning for NLP
Research Area Keywords: Meta-learning, Preference learning, Alignment, Noisy data, Large language models
Contribution Types: Model analysis & interpretability, NLP engineering experiment
Languages Studied: English
Keywords: LLM Alignment, Preference Learning, Label Noise, Meta-learning
Submission Number: 402
Loading