Keywords: Diffusion Model, Image Generation
TL;DR: A training-free style-aligned image generation method for Flux.
Abstract: Diffusion-based generative models struggle to maintain high style consistency across generated images via text description.
Although several style-aligned image generation methods have been proposed to address this issue, they exhibit suboptimal performance and are primarily built upon the U-Net architecture, limiting their compatibility with DiT diffusion models like Flux that has emerged as a predominant model in the field of image generation.
To address these limitations, we propose AlignedGen, a novel training-free style-aligned image generation method for DiT models to significantly enhance style consistency across generated images.
Specifically, AlignedGen incorporates two key components to achieve this: Shifted Position Embedding (ShiftPE) and Advanced Attention Sharing (AAS).
ShiftPE alleviates the text controllability degradation observed in prior methods when applied to DiT models through its non-overlapping position indices design, while AAS comprises three specialized techniques to unleash the full potential of DiT for style-aligned generation.
Furthermore, to broaden the applicability of our method, we present an efficient query, key, and value feature extraction algorithm, enabling our method to seamlessly incorporate external images as style references.
Extensive experimental results validate that our method effectively enhances style consistency across generated images while maintaining favorable text controllability. Code: https://github.com/Jiexuanz/AlignedGen.
Primary Area: Applications (e.g., vision, language, speech and audio, Creative AI)
Submission Number: 16253
Loading