Semi-Supervised Preference Learning for Multi-modal Large Models via Risk Analysis

18 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Preference learning; Reinforcement Learning; Semi-Supervised
Abstract: Ensuring that multi-modal large models possess reasoning capabilities aligned with human preferences is of paramount importance. Currently, the most effective approach involves fine-tuning these models using reward models optimized towards human-aligned objectives. However, optimization of reward models typically requires large-scale human-annotated datasets, which pose a significant bottleneck for downstream tasks with limited labeled samples. To address this limitation, we propose a \textbf{S}emi-supervised \textbf{P}reference learning approach based on \textbf{R}isk \textbf{A}nalysis, denoted by \textbf{SPRA}, which can accurately assess the alignment of large model outputs with human preferences using limited labeled data. The proposed SPRA measures preference by a risk model, whose construction consists of three main steps: (1) extract risk features that encode human priors from a limited set of labeled samples; (2) construct a risk model based on risk features; (3) train the risk model. Then, SPRA uses the resulting risk model to rank model responses, with lower-risk ones prioritized as preferred outputs. By explicitly incorporating human priors into its modeling framework, SPRA achieves not only high interpretability but also flexibility to adapt to diverse human preference distributions by adjusting the priors. This contrasts to traditional single-preference predictors, which lack such adaptability. In particular, the SPRA risk model is parameter-efficient, containing only thousands of parameters, which significantly reduces computational overhead and simplifies reward optimization. Our empirical study on real benchmark datasets validates the efficacy of SPRA.
Supplementary Material: zip
Primary Area: reinforcement learning
Submission Number: 11924
Loading