Keywords: stable diffusion; OCR-aware; image editing; garment generation;
Abstract: With the rapid advancement of diffusion models in image generation and editing, multimodal garment image editing has emerged as an important research direction in intelligent fashion design. However, visual text rendering on garments often suffers from issues such as illegible characters, blurred boundaries, and spelling errors. To address these challenges, we propose GlyphFashion, a diffusion-based multimodal framework for fine-grained, text-aware garment image editing. The proposed framework introduces a unified text-aware conditioning module and integrates sketch priors, color cues, and region mask contexts through ControlNet-based conditional branches, enabling precise geometric constraints and context-aware text generation throughout the denoising process. Furthermore, to alleviate the lack of high-quality text annotations in existing garment editing datasets, we construct an OCR-aware fine-grained editing dataset based on IGPair. Experimental results demonstrate that, compared with existing methods, our approach significantly improves the consistency between generated text and visual appearance, while maintaining high stability in challenging scenarios such as complex textures and small-scale printed texts.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 36
Loading