Unsupervised Decomposition with Recombination-Consistent Diffusion Models
Keywords: Compositional Generation, Recombined Latents, Diffusion Models, Discriminators, Robotics
TL;DR: We introduce a data-driven feedback mechanism to refine factorized latent space for generative models over images and videos.
Abstract: Decomposing complex data into factorized representations can reveal reusable components and enable synthesizing new samples via component recombination. We study this problem in diffusion-based models that learn factorized latent spaces without factor-level supervision. Existing compositional diffusion methods optimize reconstruction under architectural bottlenecks and rely on recombination quality emerging as a byproduct; the training objective never directly evaluates recombined samples. We introduce a recombination-consistency objective: a discriminator distinguishes single-source generations from generations produced by recombining latent factors across sources, providing a fully unsupervised signal that directly regularizes recombination outcomes. The objective is architecture-agnostic and composes with existing factorized decoders without modifying inference. Across CelebA-HQ, Virtual KITTI, CLEVR, and Falcor3D, our method improves recombination quality over reimplementations of prior baselines, achieving lower FID and stronger disentanglement (MIG, MCC). Applied to robotic videos on LIBERO, recombining learned action components yields diverse rollouts that significantly increase state-space coverage for exploration.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 170
Loading