Locating What You Need: Towards Adapting Diffusion Models to OOD Concepts In-the-Wild

Jianan Yang; Chenchao Gao; Zhiqing Xiao; Junbo Zhao; Sai Wu; Gang Chen; Haobo Wang

Locating What You Need: Towards Adapting Diffusion Models to OOD Concepts In-the-Wild

Jianan Yang, Chenchao Gao, Zhiqing Xiao, Junbo Zhao, Sai Wu, Gang Chen, Haobo Wang

Published: 25 Sept 2024, Last Modified: 06 Nov 2024NeurIPS 2024 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Image Genearation, Text-to-Image, Diffusion, Out-of-Distribution, Active Learning

Abstract: The recent large-scale text-to-image generative models have attained unprecedented performance, while people established *adaptor* modules like LoRA and DreamBooth to extend this performance to even more unseen concept tokens. However, we empirically find that this workflow often fails to accurately depict the *out-of-distribution* concepts. This failure is highly related to the low quality of training data. To resolve this, we present a framework called Controllable Adaptor Towards Out-of-Distribution Concepts (CATOD). Our framework follows the active learning paradigm which includes high-quality data accumulation and adaptor training, enabling a finer-grained enhancement of generative results. The *aesthetics* score and *concept-matching* score are two major factors that impact the quality of synthetic results. One key component of CATOD is the weighted scoring system that automatically balances between these two scores and we also offer comprehensive theoretical analysis for this point. Then, it determines how to select data and schedule the adaptor training based on this scoring system. The extensive results show that CATOD significantly outperforms the prior approaches with an 11.10 boost on the CLIP score and a 33.08% decrease on the CMMD metric.

Supplementary Material: zip

Primary Area: Generative models

Submission Number: 4503

Loading