Abstract: Image-based virtual try-on aims to transfer a clothes onto a person while preserving both person's and cloth's attributes. However, the existing methods to realize this task require a target clothes, which cannot be obtained in most cases. To address this issue, we propose a novel user-friendly virtual try-on network (UF-VTON), which only requires a person image and an image of another person wearing a target clothes to generate a result of the person wearing the target clothes. Specifically, we adopt a knowledge distillation scheme to construct a new triple dataset for supervised learning, propose a new three-step pipeline (coarse synthesis, clothing alignment, and refinement synthesis) for try-on task, and utilize an end-to-end training strategy to further refine the results. In particular, we design a new synthesis network that includes both CNN blocks and swin-transformer blocks to capture global and local information and generate highly-realistic try-on images. Qualitative and quantitative experiments show that our method achieves the state-of-the-art virtual try-on performance.
0 Replies
Loading