USO: Unified Style and Subject-Driven Generation via Disentangled and Reward Learning

16 Sept 2025 (modified: 14 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Image Generation, Image Customization, Image Unified Customization
Abstract: Existing literature typically treats style-driven and subject-driven generation as two disjoint tasks: the former prioritizes stylistic similarity, whereas the latter focuses on subject consistency, resulting in an apparent antagonism. We argue that both objectives can be simultaneously achieved within a unified framework, as they fundamentally pertain to the disentanglement and re-composition of content and style, a longstanding theme in both tasks. To this end, we introduce **USO**, a Unified Style-Subject Optimized customization model that leverages the complementary nature of these objectives, enabling them to mutually reinforce and enhance each other within a cohesive paradigm. Specifically, on the one hand, we first propose a subject-for-style data curation framework that leverages a state-of-the-art subject model to generate high-quality triplet data comprising content images, style images, and their corresponding stylized content images. Building on this foundation, USO further introduces a style-for-subject approach for content-style disentangled learning, which simultaneously aligns style features and content features to construct a cohesive customization model. Furthermore, a style reward-learning, termed SRL, is further applied to reinforce the model’s ability to extract desired style or content features from the reference image, thereby further enhancing the performance of both tasks. Extensive experiments demonstrate that USO achieves state-of-the-art performance among open-source models along both dimensions of subject consistency and style similarity.
Primary Area: generative models
Submission Number: 6944
Loading