Abstract: Virtual try-on (VTON) aims to generate a target image that aligns with the reference garment using the source image and the reference clothing. The main challenge lies in accurately generating the warped details to fit the person while maintaining the original patterns of the clothing. However, most methods face significant challenges when directly capturing the complex structure of spatial transformations, especially when it’s necessary to infer warping features from the given source clothing, which often results in noticeable visual artifacts. For that, this paper proposes a novel adaptive latent diffusion model (ALDM) to implement warping-guided before generating target images, which contains two modules: prior warping module (PWM) and adaptive alignment module (AAM). Specifically, we first present the PWM to explore the features of warping-guided and align them with the target person’s posture. This extraction process is much simpler than directly generating the target image, as PWM focuses solely on this task. Then, we devise an AAM that utilizes the warping-guided features and reference clothing features mined in the previous stage to establish an adaptive alignment between them. Lastly, our extensive results on two large-scale datasets and a user study demonstrate the photorealism of our proposed ALDM under challenging scenarios. The code and model will be available at https://github.com/gaogao2002/ALDM.
External IDs:dblp:conf/icmcs/GaoRSWH24
Loading