CORAL: Correspondence Alignment For Controllable Person Image Generation

14 Sept 2025 (modified: 12 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: virtual try-on, dense matching
Abstract: Virtual Try-On (VTON) aims to outfit a person with a specific garment from paired person and garment images. Recent diffusion-based approaches show promising results but still struggle to preserve fine-grained details such as logos, patterns, and textures. We suggest these failures come from inaccurate query–key matching in attention maps. To analyze this, we introduce a correspondence evaluation frame- work that extracts dense correspondences from attention maps and evaluates them with pseudo ground-truth matches. Using this framework, we analyze a simple DiT-based baseline and observe that its attention maps in most layers fail to cap- ture reliable semantic correspondences. We then propose CORAL, a lightweight regularization strategy with two components: correspondence loss, which cor- rects where each query attends by aligning it with reliable external matches, and entropy loss, which sharpens attention for more confident matching. CORAL improves person–garment alignment in our baseline and can be applied to other diffusion-based pipelines without architectural changes.
Primary Area: generative models
Submission Number: 5169
Loading