Mitigating Occlusions in Virtual Try-On via A Simple-Yet-Effective Mask-Free Framework

Chenghu Du; Shengwu Xiong; Junyin Wang; Yi Rong; Shili Xiong

Mitigating Occlusions in Virtual Try-On via A Simple-Yet-Effective Mask-Free Framework

Chenghu Du, Shengwu Xiong, Junyin Wang, Yi Rong, Shili Xiong

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: virtual try-on, de-occlusion, diffusion models

TL;DR: This paper investigates occlusion issues in virtual try-on (VTON) tasks and proposes a novel mask-free framework that effectively addresses inherent and acquired occlusions through background pre-replacement and covering-and-eliminating operations.

Abstract: This paper investigates the occlusion problems in virtual try-on (VTON) tasks. According to how they affect the try-on results, the occlusion issues of existing VTON methods can be grouped into two categories: (1) Inherent Occlusions, which are the ghosts of the clothing from reference input images that exist in the try-on results. (2) Acquired Occlusions, where the spatial structures of the generated human body parts are disrupted and appear unreasonable. To this end, we analyze the causes of these two types of occlusions, and propose a novel mask-free VTON framework based on our analysis to deal with these occlusions effectively. In this framework, we develop two simple-yet-powerful operations: (1) The background pre-replacement operation prevents the model from confusing the target clothing information with the human body or image background, thereby mitigating inherent occlusions. (2) The covering-and-eliminating operation enhances the model's ability of understanding and modeling human semantic structures, leading to more realistic human body generation and thus reducing acquired occlusions. Moreover, our method is highly generalizable, which can be applied in in-the-wild scenarios, and our proposed operations can also be easily integrated into different generative network architectures (e.g., GANs and diffusion models) in a plug-and-play manner. Extensive experiments on three VTON datasets validate the effectiveness and generalization ability of our method. Both qualitative and quantitative results demonstrate that our method outperforms recently proposed VTON benchmarks.

Primary Area: Deep learning (e.g., architectures, generative models, optimization for deep networks, foundation models, LLMs)

Submission Number: 5367

Loading