Abstract: The utilization of virtual try-on has gained popularity in the fashion and e-commerce industries as it enables customers to try on clothing virtually before making online purchases. However, existing virtual try-on techniques encounter difficulties in handling complex poses and distortions, which often result in visible misalignments or defects. To overcome these challenges, we propose DiffusionVTON, a virtual try-on framework that employs denoising diffusion models and an Enhanced Garment Guide decoder. Our approach relies on pose keypoints, target models, and clothing images, reducing additional input requirements and mitigating the effects of potentially inaccurate intermediate predictions. The Enhanced Garment Guide decoder enhances the virtual try-on results by incorporating additional garment information into each layer of the decoder, improving image quality and preserving clothing details. Experimental results on the VITON and MPV datasets showcase that our approach surpasses current methods in terms of image quality and fidelity. This enhancement delivers users with realistic and precise virtual try-on experiences.
Loading