Keywords: Learning Dynamics, Diffusion Models, Compositional Generalization, Capability, Behavior
TL;DR: Diffusion Models can learn compositional generalization capabilities long before eliciting such behavior.
Abstract: Understanding how multimodal models generalize out of distribution is a fundamental challenge in machine learning. Compositional generalization explains this by assuming the model learns concepts and how to compose them. In this work, we train diffusion models on a compositional task from synthetic data of objects of different size and colors. We introduce a concept space as a framework to understand the learning dynamics of compositional generalization. In this framework, we identify $\textit{concept signal}$ as a driver of compositional generalization. Next, we find that diffusion models can acquire the $\textit{capability}$ to compositionally generalize long before it elicits this $\textit{behavior}$. Additionally, we find that the time of capability learning can be pinpointed from the concept space learning dynamics. Finally, we suggest a $\textit{embedding disentanglement}$ as another metric to probe the capability of a model. Overall, we make a step in understanding the emergence of compositional capabilities in diffusion models.
Student Paper: Yes
Submission Number: 84
Loading