UniCTokens-R1: Boosting Unified Personalization via Reinforcement Learning

Zijun Shen; Ruichuan An; Sihan Yang; Ziyu Guo; Gaole Dai; Hao Liang; Ming Lu; Renrui Zhang; Wentao Zhang

UniCTokens-R1: Boosting Unified Personalization via Reinforcement Learning

Zijun Shen, Ruichuan An, Sihan Yang, Ziyu Guo, Gaole Dai, Hao Liang, Ming Lu, Renrui Zhang, Wentao Zhang

19 Sept 2025 (modified: 20 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Vision Language Models; Personalization

Abstract: The rapid development of Unified Models demonstrates their potential for personalized understanding and generation tasks. However, existing methods either focus on single tasks or rely on complex training processes to achieve cross-task information sharing, which hinders the model's ability to fully capture user information and its broader real-world applications. In this work, we propose UniCTokens-R1, an end-to-end reinforcement learning framework that facilitates mutual enhancement of understanding and generation. Specifically, the model performs both tasks in a single stage, leveraging the detailed semantic information obtained from the understanding task to assist in generation, and subsequently using the generated results as feedback to improve understanding capabilities. We adopt an optimization method, UniCTask-GRPO, that integrates ensembled rewards to seamlessly optimize both tasks simultaneously. We also propose a novel training strategy that dynamically adjusts the number of generated samples to accelerate convergence. To better model real-world user requests, we expanded the existing UnifyBench from two perspectives: denser descriptions and additional user extra information. Experiments demonstrate that our UniCTokens-R1 achieves state-of-the-art results on UnifyBench++, showcasing model's cross-task information reasoning capabilities.

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 19343

Loading