Keywords: unified multimodal large language models, alignment
Abstract: Unified Multi-Modal Large Language Models (U-MLLMs) have demonstrated remarkable capabilities in text-to-image (T2I) generation, yet their safety alignment remains under-explored. As these models become increasingly powerful, the potential for generating toxic or harmful content grows correspondingly. Current T2I alignment methods primarily focus on enhancing image quality while neglecting safety considerations. Moreover, the reward signals employed in existing approaches are typically sparse, providing only a single score per image, which limits the granularity of feedback. This paper introduces a novel approach that integrates dense rewards into a Group Relative Policy Optimization (GRPO) framework for improving image quality, incorporating safety-specific reward signals to enhance safety alignment. Our method transforms dense reward into token-level weights that modulate the training process, enabling fine-grained optimization that suppresses problematic regions while focusing learning on well-aligned image regions. Experiments demonstrate strong performance: our method achieves competitive quality metrics (WISE: 0.50) while reducing unsafe content generation by 59.4\% on the MMDT benchmark. This work advances both the quality and safety of U-MLLMs demonstrating a comprehensive approach for U-MLLM alignment.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 21526
Loading