Keywords: multi-reward; text-to-image;optimal transport;pareto optimal
Abstract: Text-to-image generation models have achieved remarkable progress in preference optimization, yet achieving robust alignment across diverse reward models remains a significant challenge. Existing multi-reward fusion approaches rely on weighted summation, which is costly to tune and insufficient for balancing conflicting objectives. More critically, optimization with reward models is highly susceptible to reward hacking. We theoretically suggest that using a unified global upper bound as the optimization target may induce reward hacking in certain samples. In addition, optimization with weak reward models is particularly prone to exacerbating this risk. To address this issue, we propose a Pareto frontier-guided optimal transport framework, which constructs a frontier for each prompt as the optimization target and maps generated samples within the same batch to their corresponding frontiers. Based on the characteristics of the reward models, we further design both online and offline optimization strategies to adapt to the distinct requirements of different reward models. Finally, we introduce the Joint Domination Rate (JDR) and Joint Collapse Rate (JCR) as more principled metrics for evaluating multi-reward optimization. Experimental results demonstrate that, compared with strong baselines, our method achieves a 10\% performance improvement, effectively mitigating reward hacking while enhancing multi-reward alignment.
Primary Area: generative models
Submission Number: 1979
Loading