Improving Compositional Generation with Diffusion Models Using Lift Scores

Chenning Yu; Sicun Gao

Improving Compositional Generation with Diffusion Models Using Lift Scores

Chenning Yu, Sicun Gao

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: We introduce a training-free resampling criterion for compositional generation with diffusion models, which is computational efficient and requires no additional modules.

Abstract: We introduce a novel resampling criterion using lift scores, for improving compositional generation in diffusion models. By leveraging the lift scores, we evaluate whether generated samples align with each single condition and then compose the results to determine whether the composed prompt is satisfied. Our key insight is that lift scores can be efficiently approximated using only the original diffusion model, requiring no additional training or external modules. We develop an optimized variant that achieves relatively lower computational overhead during inference while maintaining effectiveness. Through extensive experiments, we demonstrate that lift scores significantly improved the condition alignment for compositional generation across 2D synthetic data, CLEVR position tasks, and text-to-image synthesis. Our code is available at github.com/rainorangelemon/complift.

Lay Summary: Modern AI image generation tools can create pictures based on written prompts like “an elephant with glasses.” But when these prompts include multiple conditions — especially when combining them — the results often miss important details. We developed a method to better check whether each part of a prompt is actually being followed. Think of it like a checklist that evaluates if the final image matches each condition before saying the whole prompt is satisfied. Our method uses a simple trick called “lift scores” that can work directly with existing models — no retraining or extra tools needed. We also made an efficient version that works faster without losing accuracy. In experiments ranging from simple shapes to complex scenes and real images, our method showed clear improvements in following the prompt correctly. This could help make AI-generated content more accurate, controllable, and reliable — especially when combining multiple ideas into one request.

Link To Code: https://rainorangelemon.github.io/complift/

Primary Area: Applications->Computer Vision

Keywords: Diffusion Models, Training-free, Rejection Sampling

Submission Number: 7498

Loading