Unraveling the Latent Hierarchical Structure of Language and Images via Diffusion Models

Published: 10 Oct 2024, Last Modified: 09 Nov 2024SciForDL OralEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: Hierarchical compositionality in data structure predicts a diverging length-scale at a phase transition observed in diffusion models.
Abstract: High-dimensional data must be highly structured to be learnable. Although the compositional and hierarchical nature of data is often put forward to explain learnability, quantitative measurements establishing these properties are scarce. Likewise, accessing the latent variables underlying such a data structure remains a challenge. Forward-backward experiments in diffusion-based models, where a datum is noised and then denoised, are a promising tool to achieve these goals. We predict in simple hierarchical models that in this process, changes in data occur by correlated chunks, with a length scale that diverges at a noise level where a phase transition is known to take place. Remarkably, we confirm this prediction in both text and image datasets using state-of-the-art diffusion models. Our results suggest that forward-backward experiments are informative on the nature of latent variables, and that the effect of changing deeper ones is revealed near the transition.
Style Files: I have used the style files.
Submission Number: 16
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview