Diff-Props: is Semantics Preserved within a Diffusion Model?

Published: 01 Jan 2024, Last Modified: 04 Nov 2025KES 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The ambition to create increasingly realistic images has driven researchers to develop increasingly powerful models, capable of generalizing and generating high-resolution images, even in a multimodal setup (e.g., from textual input). Among the most recent generative networks, Stable Diffusion Models (SDMs) have achieved state-of-the-art showing great generative capabilities but also a high degree of complexity, both in terms of training and interpretability. Indeed, the impressive generalization capability of pre-trained SDMs has pushed researchers to exploit their internal representation to perform downstream tasks (e.g., classification and segmentation). Understanding how well the model preserves semantic information is fundamental to improve its performance. Our approach, namely Diff-Props, analyses the features extracted from the U-Net within Stable Diffusion Model to unveil how Stable Diffusion retains semantic information of an image in a pre-trained setup. Exploiting a set of different distance metrics, Diff-Props aims to analyse how features at different depths contribute to preserving the meaning of the objects in the image.
Loading