Abstract: Recent research on text-to-image diffusion models has involved personalization methods to learn novel concepts from a few user-provided images for target image generation. Continual learning extends personalization to multiple concepts through long sequence training, resulting in more practical applications. However, the continual learning approach causes the following trade-off problems: reduced stability for previously learned concepts as the number of concepts increases, and reduced plasticity by maintaining the performance of past concepts. In this study, we propose a continual learning approach for personalization that maintains high-performance image generation across multiple concepts while learning new targets. Our method leverages deep generative replay with latent variables to balance the past and new concepts, demonstrating that replay in the latent space effectively suppresses performance degradation. Furthermore, by integrating a sub-model with each layer of the latent diffusion model through the conditional input of latent variables, the proposed method achieves robust personalization for diverse concepts. Our experiments demonstrate that the proposed method, which is optimized for many concepts, achieves superior performance comparable to state-of-the-art methods specialized for a few concepts. In addition, in the context of continual learning, we demonstrate that the proposed method effectively mitigates performance degradation for previously learned concepts, indicating its potential for realistic image generation applications involving long-tail scenarios.
External IDs:dblp:journals/access/MatsudaTMOH26
Loading