Two-Way Garment Transfer: Unified Diffusion Framework for Dressing and Undressing Synthesis

Angang Zhang; Fang Deng; Junyan Li; CHEN Hao; Zhongjian Chen

Two-Way Garment Transfer: Unified Diffusion Framework for Dressing and Undressing Synthesis

Angang Zhang, Fang Deng, Junyan Li, CHEN Hao, Zhongjian Chen

20 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Diffusion Models, Image Generation, Virtual Try-On, Virtual Try-off

TL;DR: This work propose the first unified framework for joint clothing-centric image synthesis that simultaneously resolves both mask-guided virtual try-on and mask-free virtual try-off. Extensive experiments validate the effectiveness of the model.

Abstract: While recent advances in virtual try-on (VTON) have achieved realistic garment transfer to human subjects, its inverse task, virtual try-off (VTOFF), which aims to reconstruct canonical garment templates from dressed humans, remains critically underexplored and lacks systematic investigation. Existing works predominantly treat them as isolated tasks: VTON focuses on garment dressing while VTOFF addresses garment extraction, thereby neglecting their complementary symmetry. To bridge this fundamental gap, we propose the Two-Way Garment Transfer Model (TWGTM), to the best of our knowledge, the first unified framework for joint clothing-centric image synthesis that simultaneously resolves both mask-guided VTON and mask-free VTOFF through bidirectional feature disentanglement. Specifically, our framework employs dual-conditioned guidance from both latent and pixel spaces of reference images to seamlessly bridge the dual tasks. On the other hand, to resolve the inherent mask dependency asymmetry between mask-guided VTON and mask-free VTOFF, we devise a phased training paradigm that progressively bridges this modality gap. Extensive qualitative and quantitative experiments conducted across the DressCode and VITON-HD datasets validate the efficacy and competitive edge of our proposed approach.

Supplementary Material: zip

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 24948

Loading