Latent Diffusion U-Net Representations Contain Positional Embeddings and Anomalies

Published: 06 Mar 2025, Last Modified: 13 Apr 2025ICLR 2025 DeLTa Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0
Track: tiny / short paper (up to 4 pages)
Keywords: image generation, latent diffusion, stable diffusion, anomaly, position embedding, representation analysis
TL;DR: Stable diffusion U-Net representations contain positional embeddings and high-norm or high-similarity anomalies that may harm downstream task performance.
Abstract: Text-conditioned image diffusion models have demonstrated remarkable capabilities in synthesizing realistic images, spurring growing interest in using their internal representations for various downstream tasks. To better understand the robustness of these representations, we analyze popular Stable Diffusion models using representational similarity and norms. Our findings reveal three phenomena: (1) the presence of a learned positional embedding in intermediate representations, (2) high-similarity corner artifacts, and (3) anomalous high-norm artifacts. These findings underscore the need to further investigate the properties of diffusion model representations, particularly before considering them for downstream tasks that require robust features of high spatial fidelity.
Submission Number: 7
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview