TILT: Test-Time Reward Alignment via Distribution Tilting for Compositional Generation
Keywords: Text-to-image generation, Compositional Generation, Diffusion Model, Reward Alignment
Abstract: Recent advances in powerful text-to-image generation models have made it increasingly important to develop test-time methods that modify the sampling trajectory to produce images more faithful to complex compositional prompts. We present TILT, a training-free framework for compositional text-to-image generation via test-time reward alignment. We interpret compositional failures as overlap modes between joint and single-concept distributions, and define a pure-mode reward that favors samples where all concepts are jointly present while remaining close to the pretrained model. This yields a KL-constrained objective with a closed-form tilted target distribution and principled guiding steps for diffusion sampling. Our framework also recovers CO3 as a special case, giving theoretical grounding to prior methods for compostional generation using heuristic correctors. Experiments on prompts from T2ICompBench show that our method improves compositional alignment while preserving image quality compared to previous baselines.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 235
Loading