Omni-Reward: Towards Generalist Omni-Modal Reward Modeling with Free-Form Preferences

04 Apr 2025 (modified: 30 Oct 2025)Submitted to NeurIPS 2025 Datasets and Benchmarks TrackEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Omni-Modal Models, Reward Models, Alignment
TL;DR: We propose Omni-Reward, a step towards universal omni-modal reward modeling with free-form preferences.
Abstract: Reward models (RMs) play a critical role in aligning AI behaviors with human preferences, yet they face two fundamental challenges: (1) **Modality Imbalance**, where most RMs are mainly focused on text and image modalities, offering limited support for video, audio, and other modalities; and (2) **Preference Rigidity**, where training on fixed binary preference pairs fails to capture the complexity and diversity of personalized preferences. To address the above challenges, we propose Omni-Reward, a step toward generalist omni-modal reward modeling with support for free-form preferences, consisting of: (1) **Evaluation**: We introduce Omni-RewardBench, the first omni-modal RM benchmark with free-form preferences, covering nine tasks across five modalities including text, image, video, audio, and 3D; (2) **Data**: We construct Omni-RewardData, a multimodal preference dataset comprising 248K general preference pairs and 69K instruction-tuning pairs for training generalist omni-modal RMs; (3) **Model**: We propose Omni-RewardModel, which includes both discriminative and generative RMs, and achieves strong performance on Omni-RewardBench as well as other widely used RM benchmarks.
Croissant File: json
Dataset URL: https://huggingface.co/datasets/HongbangYuan/OmniRewardBench
Code URL: https://github.com/HongbangYuan/OmniReward
Supplementary Material: zip
Primary Area: Datasets & Benchmarks for applications in language modeling and vision language modeling
Flagged For Ethics Review: true
Submission Number: 5
Loading