Keywords: personalized generation, animal subjects, diffusion models, ControlNet, generative AI
Abstract: Personalized animal image generation is challenging due to rich appearance cues and large morphological variability. Existing approaches often exhibit feature misalignment across domains, which leads to identity drift. We present AnimalBooth, an inference-time tuning-free framework that strengthens identity preservation with an Animal-Net and an adaptive attention module, mitigating cross-domain alignment errors. We further introduce a frequency-controlled feature integration module that applies Discrete Cosine Transform filtering in the latent space to guide the diffusion process, enabling a coarse-to-fine progression from global structure to detailed texture. To advance research in this area, we curate AnimalBench, a high-resolution dataset for animal personalization. Extensive experiments show that AnimalBooth consistently outperforms strong baselines on multiple benchmarks with superior efficiency, improving both identity fidelity and perceptual quality. The code and dataset will be made publicly available in the future.
Submission Number: 13
Loading