AnimeDiff: Customized Image Generation of Anime Characters Using Diffusion Model

Published: 01 Jan 2024, Last Modified: 09 Apr 2025IEEE Trans. Multim. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Due to the unprecedented power of text-to-image diffusion models, customizing these models to generate new concepts has gained increasing attention. Existing works have achieved some success on real-world concepts, but fail on the concepts of anime characters. We empirically find that such low quality comes from the newly introduced identifier text tokens, which are optimized to identify different characters. In this paper, we propose AnimeDiff which focuses on customized image generation of anime characters. Our AnimeDiff directly binds anime characters with their names and keeps the embeddings of text tokens unchanged. Furthermore, when composing multiple characters in a single image, the model tends to confuse the properties of those characters. To address this issue, our AnimeDiff incorporates a Cut-and-Paste data augmentation strategy that produces multi-character images for training by cutting and pasting multiple characters onto background images. Experiments are conducted to prove the superiority of AnimeDiff over other methods.
Loading