Keywords: Multi-subject personalization, text-to-image, Diffusion models
TL;DR: In this work, we present a novel technique for personalization with diffusion models based on selective U-net influence and present an exploration into the contribution of U-Net blocks for the multi-subject personalization task.
Abstract: Diffusion models have shown exceptional capability to personalize the subjects with very few reference images of the subjects. However, state-of-the-art personalization techniques based on diffusion models suffer from some major limitations and generate images with distortion, identity mixing, and repetition of the subjects. U-Net blocks in the diffusion models are known to capture the information for diverse attributes such as the color, style, layout, objects, etc. In this work, we present a novel technique for personalization based on selective U-Net influence (SelUT} where we control the influence of the trained U-Net blocks during inference with the text-conditioned diffusion model. Furthermore, we present an ensemble selection technique to select the best generated image with SelUT based on the Characteristic Objects Method (COMET) considering quantitative evaluation metrics as the criterion. We observe that our approach helps address the limitations and shows significant improvement against the state-of-the-art techniques in quantitative and qualitative evaluation.
Submission Number: 75
Loading