Meta-Align: Robust Preference Alignment via Perplexity-aware Meta-Learning

Meta-Align: Robust Preference Alignment via Perplexity-aware Meta-Learning

ACL ARR 2025 May Submission402 Authors

12 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Noisy Preferences (NPs) present a significant challenge in aligning Large Language Models (LLMs), as incorrect preference labels can substantially degrade alignment quality. However, existing strategies to mitigate NPs often face two key limitations: (1) applying global-level adjustments that result in imprecise instance-level noise handling, and (2) relying on heuristic rules that limit the capacity to adaptively optimize alignment tasks. In response to these challenges, this paper proposes Meta-Align, a novel framework designed to address the aforementioned limitations. Meta‑Align pioneers a perplexity‑aware meta‑learning strategy for adaptive sample reweighting, with Perplexity Difference (PPLDiff) serving as a fine‑grained, instance‑level signal. Unlike traditional methods employing static rules, Meta-Align trains an adaptive weighting function via meta-learning. This function dynamically assigns sample weights based on their PPLDiff, guided by performance on a small, clean meta-dataset. Such a design enables precise instance-level noise modulation while optimizing the weighting strategy in an adaptive manner. Comprehensive experiments on benchmark datasets demonstrate that Meta-Align substantially outperforms state-of-the-art robust alignment methods, effectively down-weighting potentially noisy preferences while emphasizing reliable ones.

Paper Type: Long

Research Area: Machine Learning for NLP

Research Area Keywords: Meta-learning, Preference learning, Alignment, Noisy data, Large language models

Contribution Types: Model analysis & interpretability, NLP engineering experiment

Languages Studied: English

Keywords: LLM Alignment, Preference Learning, Label Noise, Meta-learning

Submission Number: 402

Loading