Abstract: In recent years, with advancements in generative models, an increasing number of garment design methods have been proposed. A generative model capable of generating garment images from text and sketches can provide designers with valuable visual references and creative inspiration to aid in the design process. Existing multimodal garment design methods face the challenge of lacking precise control over the generated results in relation to both sketches and text. In this paper, we propose Multimodal Enhancement and Fusion Network for Garment Design (MEF-GD). Our model inputs image conditions into Stable Diffusion based on ControlNet. On one hand, directly inputting image conditions can lead to feature forgetting, defined as the phenomenon in deep neural networks where previously learned feature representations are lost. To address this issue, we propose a multiple feature injection module to more effectively enhance image condition features. On the other hand, ControlNet fuses control features into Stable Diffusion through pointwise addition, which ignores the interaction between multimodal features and results in the fused features being biased towards the control features, overlooking Stable Diffusion features. To address this limitation, we introduce content-guided attention for more effective feature fusion and improve the expression of text features. Additionally, existing datasets often contain vague textual descriptions of garments. It is difficult to train the model on such a dataset to learn accurate alignment between generated image and the textual descriptions. To address this issue, we have designed a multimodal large model text optimization module to improve the quality and clarity of text generation. Compared to existing multimodal garment design methods, MEF-GD achieves more effective alignment with both textual and sketch-based inputs in generating garment images. Compared to MGD, MEF-GD achieves a decrease of 2.44 in FID and an increase of 0.83 in CLIP Score on Multi-VITON-HD dataset. The code will be available at https://github.com/fengyun691340/MEF-GD
External IDs:dblp:journals/tcsv/SongZZTZKL26
Loading