Virtual Fitting Room: Generating Arbitrarily Long Videos of Virtual Try-On from a Single Image

Jun-Kun Chen; Aayush Bansal; Minh Phuoc Vo; Yu-Xiong Wang

Virtual Fitting Room: Generating Arbitrarily Long Videos of Virtual Try-On from a Single Image

Jun-Kun Chen, Aayush Bansal, Minh Phuoc Vo, Yu-Xiong Wang

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Virtual Try-On, Video Diffusion Models, Generative Models

TL;DR: We propose VFR that generates minute-scale long virtual try-on videos at high resolution.

Abstract: This paper proposes Virtual Fitting Room (VFR), a novel video generative model that produces arbitrarily long virtual try-on videos. Our VFR models long video generation tasks as an auto-regressive, segment-by-segment generation process, eliminating the need for resource-intensive generation and lengthy video data, while providing the flexibility to generate videos of arbitrary length. The key challenges of this task are twofold: ensuring local smoothness between adjacent segments and maintaining global temporal consistency across different segments. To address these challenges, we propose our VFR framework, which ensures smoothness through a prefix video condition and enforces consistency with the anchor video — a 360°-view video that comprehensively captures the human's whole-body appearance. Our VFR generates minute-scale virtual try-on videos with both local smoothness and global temporal consistency under various motions, making it a pioneering work in long virtual try-on video generation. Project Page: https://immortalco.github.io/VirtualFittingRoom/.

Supplementary Material: zip

Primary Area: Deep learning (e.g., architectures, generative models, optimization for deep networks, foundation models, LLMs)

Submission Number: 24398

Loading