MSPO: Meta Soft Preference Optimization for Robust LLM Alignment

MSPO: Meta Soft Preference Optimization for Robust LLM Alignment

ACL ARR 2025 May Submission332 Authors

11 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Noise in preference data significantly impedes the alignment of large language models (LLMs) with human preferences. However, existing methods struggle with two key challenges: reliably identifying noisy preferences and accurately representing preference intensity. To address these challenges, we introduce Meta Soft Preference Optimization (MSPO), a novel framework. MSPO employs a meta-learner to optimize soft preference labels for the alignment task. This meta-learner produces new, adaptive soft labels. To achieve this, it processes initial preference indications alongside noise-indicative signals, primarily perplexity differences (PPLDiff) between paired responses. The meta-learner is optimized using a small, clean meta-dataset to enhance downstream LLM alignment performance. Extensive experiments demonstrate that MSPO effectively mitigates the adverse effects of noisy preferences. It significantly improves the robustness of LLM alignment in various noisy environments and outperforms existing baseline methods.

Paper Type: Long

Research Area: Interpretability and Analysis of Models for NLP

Research Area Keywords: Robust Preference Alignment, LLM Alignment, Meta-Learning

Contribution Types: Model analysis & interpretability, Theory

Languages Studied: English

Keywords: LLM Alignment, Soft Preference Optimization, Noise Robustness

Submission Number: 332

Loading