Keywords: Virtual Tryon
Abstract: We propose EVTER, an end-to-end virtual try-on model that incorporates additional reference images. Most existing virtual try-on models rely on complex inputs, such as images with agnostic clothing areas, human pose, densepose, and body keypoints, making them labor-intensive and impractical for real-world applications. In contrast, EVTER addresses these challenges by adopting an end-to-end training strategy, allowing for simple inference with only the source image and target clothing as inputs. The model generates try-on images without the need for masking. Moreover, to enhance try-on quality, our model can utilize additional reference images, inspired by how humans typically select clothing. To enable this capability, we built a dataset with supplementary reference images for training. We evaluate our model on popular benchmarks, and the results validate the effectiveness of our proposed approach.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 4403
Loading