Keywords: Text to Image Diffusion Model; Seed selection
Abstract: Text-to-image diffusion models can synthesize high-quality images, yet the outcome is notoriously sensitive to the random seed: different initial seeds often yield large variations in image quality and prompt--image alignment. We revisit this ``seed effect'' and show that early-stage attention dynamics over prompt core tokens---the content-bearing words---strongly predict final generation quality. Building on this observation, we introduce ADSS—Attention-Driven Seed Selection—a training-free, plug-and-play method that tracks cross-attention to core tokens during sampling to rank and select seeds for a fixed prompt, requiring no finetuning or latent changes and globally ranking the entire seed pool rather than using a fixed threshold. Since it operates purely at inference time, ADSS can also serve as a lightweight add-on preselection step before existing seed-optimization pipelines, enabling additional gains without extra training or code changes. Extensive experiments on three benchmarks show consistent improvements in prompt faithfulness and visual quality across Stable Diffusion variants, as reflected by human preference and alignment metrics. Our results highlight ADSS as a simple and effective route to more controllable generation by leveraging prompt core token attention for robust seed preselection.
Primary Area: generative models
Submission Number: 16298
Loading