Track: tiny / short paper (up to 4 pages)
Keywords: image generation, latent diffusion, stable diffusion, anomaly, position embedding, representation analysis
TL;DR: Stable diffusion U-Net representations contain positional embeddings and high-norm or high-similarity anomalies that may harm downstream task performance.
Abstract: Text-conditioned image diffusion models have demonstrated remarkable capabilities in synthesizing realistic images, spurring growing interest in using their internal representations for various downstream tasks. To better understand the robustness of these representations, we analyze popular Stable Diffusion models using representational similarity and norms. Our findings reveal three phenomena: (1) the presence of a learned positional embedding in intermediate representations, (2) high-similarity corner artifacts, and (3) anomalous high-norm artifacts. These findings underscore the need to further investigate the properties of diffusion model representations, particularly before considering them for downstream tasks that require robust features of high spatial fidelity.
Submission Number: 7
Loading