MSPO: Meta Soft Preference Optimization for Robust LLM Alignment

ACL ARR 2025 May Submission332 Authors

11 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Noise in preference data significantly impedes the alignment of large language models (LLMs) with human preferences. However, existing methods struggle with two key challenges: reliably identifying noisy preferences and accurately representing preference intensity. To address these challenges, we introduce Meta Soft Preference Optimization (MSPO), a novel framework. MSPO employs a meta-learner to optimize soft preference labels for the alignment task. This meta-learner produces new, adaptive soft labels. To achieve this, it processes initial preference indications alongside noise-indicative signals, primarily perplexity differences (PPLDiff) between paired responses. The meta-learner is optimized using a small, clean meta-dataset to enhance downstream LLM alignment performance. Extensive experiments demonstrate that MSPO effectively mitigates the adverse effects of noisy preferences. It significantly improves the robustness of LLM alignment in various noisy environments and outperforms existing baseline methods.
Paper Type: Long
Research Area: Interpretability and Analysis of Models for NLP
Research Area Keywords: Robust Preference Alignment, LLM Alignment, Meta-Learning
Contribution Types: Model analysis & interpretability, Theory
Languages Studied: English
Keywords: LLM Alignment, Soft Preference Optimization, Noise Robustness
Submission Number: 332
Loading