EVTAR: End-to-End Try on with Additional Unpaired Visual Reference

Liuzhuozheng Li; Yue Gong; Shanyuan Liu; Liebucha Wu; Zanyi Wang; Dengyang Jiang; Bo Cheng; YuhangMa; Dawei Leng; Yuhui Yin

EVTAR: End-to-End Try on with Additional Unpaired Visual Reference

Liuzhuozheng Li, Yue Gong, Shanyuan Liu, Liebucha Wu, Zanyi Wang, Dengyang Jiang, Bo Cheng, YuhangMa, Dawei Leng, Yuhui Yin

12 Sept 2025 (modified: 14 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Virtual Tryon

Abstract: We propose EVTER, an end-to-end virtual try-on model that incorporates additional reference images. Most existing virtual try-on models rely on complex inputs, such as images with agnostic clothing areas, human pose, densepose, and body keypoints, making them labor-intensive and impractical for real-world applications. In contrast, EVTER addresses these challenges by adopting an end-to-end training strategy, allowing for simple inference with only the source image and target clothing as inputs. The model generates try-on images without the need for masking. Moreover, to enhance try-on quality, our model can utilize additional reference images, inspired by how humans typically select clothing. To enable this capability, we built a dataset with supplementary reference images for training. We evaluate our model on popular benchmarks, and the results validate the effectiveness of our proposed approach.

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 4403

Loading