Keywords: image editing, diffusion model, generative model
TL;DR: Sequence Relative Attention Guidance of Image Editing
Abstract: Diffusion Transformers (DiT) have become the backbone of modern instruction-driven image editing. Yet, their capabilities remain limited by a trade-off between reference consistency and responsiveness to editing instructions. We observe a significant bias value at fixed embedding indices, which participates as a weight in the computation of attention scores. Leveraging this phenomenon, we propose Group Relative Attention Guidance (GRAG), which treats the mean of group feature vectors as a bias. By modulating the relative deviations of tokens from this bias, GRAG enhances tokens aligned with the bias direction, thereby producing more accurate editing results. Validation experiments on the latest baselines demonstrate that GRAG effectively improves the editing performance of existing models.
Primary Area: generative models
Submission Number: 1857
Loading