JEDI: The Force of Jensen-Shannon Divergence in Disentangling Diffusion Models

Eric Tillmann Bill; Enis Simsar; Thomas Hofmann

JEDI: The Force of Jensen-Shannon Divergence in Disentangling Diffusion Models

Eric Tillmann Bill, Enis Simsar, Thomas Hofmann

Published: 10 Jun 2025, Last Modified: 11 Jul 2025PUT at ICML 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Image Generation, Diffusion Models, Test-Time Adaptation, Subject Disentanglement

Abstract: We introduce JEDI, a test-time adaptation method that enhances subject separation and compositional alignment in diffusion models without requiring retraining or external supervision. JEDI operates by minimizing semantic entanglement in attention maps using a novel Jensen-Shannon divergence based objective. To improve efficiency, we leverage adversarial optimization, reducing the number of updating steps required. JEDI is model-agnostic and applicable to architectures such as Stable Diffusion 1.5 and 3.5, consistently improving prompt alignment and disentanglement in complex scenes. Additionally, JEDI provides a lightweight, CLIP-free disentanglement score derived from internal attention distributions, offering a principled benchmark for compositional alignment under test-time conditions. Code and results are available at https://ericbill21.github.io/JEDI/.

Submission Number: 1

Loading