Multi-Object Editing in Personalized Text-To-Image Diffusion Model Via Segmentation Guidance

Haruka Matsuda, Ren Togo, Keisuke Maeda, Takahiro Ogawa, Miki Haseyama

Published: 01 Jan 2024, Last Modified: 09 Nov 2024ICASSP 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: This paper presents a personalized text-to-image diffusion model for multiple object editing that can improve visual fidelity of the target image and editing ability with a segmentation-based restriction and continual learning. Multiple personalization tasks face the problem of destabilization, especially when the number of targets increases and the concepts of the targets are similar. The proposed method introduces a segmentation guide into continual learning to improve performance for multiple objects. The segmentation guide helps to separate each concept by restricting the regions of target objects during both training and inference. The proposed method learns these concepts by continual learning with Elastic Weight Consolidation, and achieves the output of multiple target objects with concept separation while maintaining visual fidelity. Experimental results demonstrate that the proposed method successfully maintains visual fidelity for multiple target objects.