SphereEdit: Geometric Control for Composable Diffusion-Based Image Editing

Published: 24 Sept 2025, Last Modified: 25 Nov 2025NEGEL 2025 OralEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Hyperspherical representations, diffusion models, image editing, cross-attention attribution, token-aware spatial masks, training-free editing
TL;DR: SphereEdit: a training-free method that uses hyperspherical attribute directions and cross-attention masks to achieve localized, identity-preserving, composable diffusion edits.
Abstract: Despite significant advances in diffusion models, achieving precise, composable image editing without task-specific training remains a challenge. Existing approaches often rely on iterative optimization or linear latent operations, which are slow, brittle, and prone to entangling attributes (e.g., lipstick altering skin tone). We introduce SphereEdit, a training-free framework that leverages the hyperspherical geometry of CLIP embeddings to enable interpretable, fine-grained control. We model semantic attributes as unit-norm directions on the sphere and show that it supports clean composition via angular controls. At inference, SphereEdit uses spherical directions to modulate cross-attention producing spatially localized edits across diverse domains without optimization or fine-tuning. Experiments demonstrate sharper, more disentangled adjustments. SphereEdit provides a geometrically grounded, plug-and-play framework for controllable and composable diffusion editing.
Submission Number: 42
Loading