Mechanisms of Projective Composition of Diffusion Models

Arwen Bradley; Preetum Nakkiran; David Berthelot; James Thornton; Joshua M. Susskind

Mechanisms of Projective Composition of Diffusion Models

Arwen Bradley, Preetum Nakkiran, David Berthelot, James Thornton, Joshua M. Susskind

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY-NC-ND 4.0

TL;DR: We theoretically study compositions of diffusion models.

Abstract: We study the theoretical foundations of composition in diffusion models, with a particular focus on out-of-distribution extrapolation and length-generalization. Prior work has shown that composing distributions via linear score combination can achieve promising results, including length-generalization in some cases (Du et al., 2023; Liu et al., 2022). However, our theoretical understanding of how and why such compositions work remains incomplete. In fact, it is not even entirely clear what it means for composition to "work". This paper starts to address these fundamental gaps. We begin by precisely defining one possible desired result of composition, which we call *projective composition*. Then, we investigate: (1) when linear score combinations provably achieve projective composition, (2) whether reverse-diffusion sampling can generate the desired composition, and (3) the conditions under which composition fails. We connect our theoretical analysis to prior empirical observations where composition has either worked or failed, for reasons that were unclear at the time. Finally, we propose a simple heuristic to help predict the success or failure of new compositions.

Lay Summary: Diffusion composition enables generation of images combining multiple concepts (like "dog wearing hat") by adding together outputs from separate concept models ("dog", "wearing hat"). This simple approach sometimes works remarkably well, even for complex combinations, but can unexpectedly fail in other cases. Our research precisely defines what "successful composition" means mathematically and analyzes when linear combination of model outputs achieves the desired result. We identify conditions where this approach succeeds or fails. This theoretical framework explains previously puzzling observations about compositional generation and provides clear principles for developing more reliable diffusion models that can predictably blend concepts in ways that match our expectations.

Primary Area: Deep Learning->Generative Models and Autoencoders

Keywords: diffusion models, composition, theory

Submission Number: 1670

Loading