CaCOM: customizing text-to-image diffusion models in the wild via continual active selection

Jianan Yang; Yanming Zhang; Haobo Wang; Gang Chen; Sai Wu; Junbo Zhao

CaCOM: customizing text-to-image diffusion models in the wild via continual active selection

Jianan Yang, Yanming Zhang, Haobo Wang, Gang Chen, Sai Wu, Junbo Zhao

Published: 01 Jan 2025, Last Modified: 19 May 2025Mach. Learn. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: It is recently been pointed out that powerful diffusion models perform quite poorly and unstably in generating unseen or unknown concept tokens. Indeed, several solutions—such as LoRA and DreamBooth—were carried out to mitigate this problem. However, in this work, we first identify that these studies have been ubiquitously limited by a preset dataset and fixed number of concept tokens, which is generally impractical to the production setup. This is because the concept tokens, together with the dataset, are generically dynamic and alter all the time. Therefore, in this work, we propose CaCOM to cope with the underlying research challenges in this realistic setup, such as the catastrophic forgetting problem, etc. In brief words, CaCOM conducts careful selection over both training data and the memory bank, based on the data stream continuously given in the wild. We deem CaCOM as (i) a pioneering attempt to bring the customization closer to the production setting and (ii) a provably viable extension to the existing customization schemes. Through extensive empirical experiments, we show that CaCOM can easily be adapted to any of the customization modules while consistently enhancing them.

Loading