PLOT: Enhancing Preference Learning via Optimal Transport

PLOT: Enhancing Preference Learning via Optimal Transport

ACL ARR 2026 January Submission10309 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: fine-tuning, preference learning, optimization

Abstract: Preference learning in Large Language Models (LLMs) has advanced significantly, yet existing methods remain limited by modest performance gains, high computational costs, hyperparameter sensitivity, and insufficient modeling of global token-level relationships. We introduce **PLOT**, which enhances **P**reference **L**earning in fine-tuning-based alignment through a token-level loss derived from **O**ptimal **T**ransport. By formulating preference learning as an **Optimal Transport Problem**, PLOT aligns model outputs with human preferences while preserving the original distribution of LLMs, ensuring stability and robustness. Furthermore, PLOT leverages token embeddings to capture semantic relationships, enabling globally informed optimization. Experiments across two preference categories—**Human Values** and **Logic \& Problem Solving**—spanning seven subpreferences demonstrate that PLOT consistently improves alignment performance while maintaining fluency and coherence. These results substantiate optimal transport as a principled methodology for preference learning, establishing a theoretically grounded framework that provides new insights for preference learning of LLMs.

Paper Type: Long

Research Area: Safety and Alignment in LLMs

Research Area Keywords: fine-tuning, preference learning

Contribution Types: Model analysis & interpretability, NLP engineering experiment, Reproduction study, Theory

Languages Studied: English

Submission Number: 10309

Loading