everyone
since 18 Jun 2025">EveryoneRevisionsBibTeXCC BY 4.0
Existing efforts to align multimodal large language models (MLLMs) with human preferences have only achieved progress in narrow areas, such as hallucination reduction, but remain limited in practical applicability and generalizability. To this end, we introduce MM-RLHF, a dataset containing 120k fine-grained, human-annotated preference comparison pairs. This dataset represents a substantial advancement over existing resources, offering superior size, diversity, annotation granularity, and quality. Leveraging this dataset, we propose several key innovations to improve both the quality of reward models and the efficiency of alignment algorithms. Notably, we introduce the Critique-Based Reward Model, which generates critiques of model outputs before assigning scores, offering enhanced interpretability and more informative feedback compared to traditional scalar reward mechanisms. Additionally, we propose Dynamic Reward Scaling, a method that adjusts the loss weight of each sample according to the reward signal, thereby optimizing the use of high-quality comparison pairs. Our approach is rigorously evaluated across 10 distinct dimensions, encompassing 27 benchmarks, with results demonstrating significant and consistent improvements in model performance (Figure.1).
We are proud to open-source MM-RLHF, a comprehensive project for aligning Multimodal Large Language Models (MLLMs) with human preferences. This release includes:
Our dataset and algorithms enable consistent performance improvements across 10 dimensions and 27 benchmarks for open-source MLLMs.