<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>iTryOn: Project Page</title>
    <style>
        /* --- General Styles --- */
        body {
            font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, "Helvetica Neue", Arial, sans-serif;
            line-height: 1.6; margin: 0; background-color: #f8f9fa; color: #333;
        }
        .container { max-width: 1200px; margin: 0 auto; padding: 20px; }
        h1, h2, h3 { text-align: center; color: #212529; }
        h1 { font-size: 2.5em; margin-bottom: 10px; }
        h2 { font-size: 2em; margin-top: 60px; margin-bottom: 30px; border-bottom: 2px solid #dee2e6; padding-bottom: 10px; }
        p { font-size: 1.1em; text-align: justify; }
        a { color: #007bff; text-decoration: none; }
        a:hover { text-decoration: underline; }

        /* --- Header Section --- */
        .authors, .conference { text-align: center; margin-bottom: 20px; }
        .authors { font-size: 1.2em; }
        .conference { font-style: italic; color: #6c757d; }
        .links { text-align: center; margin-bottom: 40px; }
        .links a { display: inline-block; margin: 0 15px; padding: 10px 20px; background-color: #007bff; color: white; border-radius: 5px; font-size: 1.1em; transition: background-color 0.3s; }
        .links a:hover { background-color: #0056b3; text-decoration: none; }

        /* --- Image Styles --- */
        .full-width-image {
            width: 100%;
            height: auto;
            border-radius: 8px;
            box-shadow: 0 4px 12px rgba(0,0,0,0.1);
            margin-bottom: 20px;
        }
        .image-caption {
            text-align: center;
            font-style: italic;
            color: #6c757d;
            margin-top: -10px; /* Adjust as needed */
            margin-bottom: 40px;
        }

        /* --- Multi-item Carousel Styles --- */
        .carousel-container { position: relative; }
        .slides-wrapper {
            overflow: hidden;
            background: #fff;
            padding: 15px 0;
            border-radius: 8px;
            box-shadow: 0 4px 8px rgba(0,0,0,0.1);
        }
        .carousel-slides {
            display: flex;
            transition: transform 0.5s ease-in-out;
        }
        .carousel-item {
            flex-shrink: 0;
            box-sizing: border-box;
            padding: 0 10px; /* Spacing between items */
            text-align: center;
        }
        video { width: 100%; height: auto; border-radius: 5px; }
        .caption { margin-top: 10px; font-size: 1em; color: #495057; font-weight: bold; }
        .carousel-nav {
            position: absolute; top: 50%; width: 100%; display: flex;
            justify-content: space-between; transform: translateY(-50%); pointer-events: none;
        }
        .carousel-nav button {
            background-color: rgba(0, 0, 0, 0.5); border: none; color: white;
            width: 40px; height: 40px; cursor: pointer; border-radius: 50%;
            font-size: 18px; line-height: 1; pointer-events: auto; transition: background-color 0.3s;
        }
        .carousel-nav button:hover { background-color: rgba(0, 0, 0, 0.8); }
        .prev-btn { margin-left: -50px; }
        .next-btn { margin-right: -50px; }
        .carousel-dots { text-align: center; margin-top: 15px; }
        .dot {
            cursor: pointer; height: 12px; width: 12px; margin: 0 5px;
            background-color: #bbb; border-radius: 50%; display: inline-block;
            transition: background-color 0.3s;
        }
        .dot.active { background-color: #007bff; }

        /* --- Responsive Design --- */
        @media (max-width: 768px) {
            h1 { font-size: 2em; }
            h2 { font-size: 1.75em; }
            .links a { display: block; margin: 10px 0; }
            .prev-btn { margin-left: 5px; }
            .next-btn { margin-right: 5px; }
        }
    </style>
</head>
<body>

    <div class="container">

        <!-- ================== TITLE, AUTHORS, LINKS ================== -->
        <h1>iTryOn: Mastering Interactive Video Virtual Try-On with Spatial-Semantic Guidance</h1>
        <div class="authors">Anonymous Authors</div>
        <div class="conference">Under review as a conference paper at ICML 2026</div>
        <!-- <div class="links">
            <a href="#">[Paper PDF]</a>
            <a href="#">[Code]</a>
            <a href="#">[Dataset]</a>
        </div> -->

        <!-- ================== ABSTRACT ================== -->
        <div class="abstract">
            <h2>Abstract</h2>
            <p>Video Virtual Try-On (VVT) aims to seamlessly replace a garment on a person in a video with a new one. While existing methods have made significant strides in maintaining temporal consistency, they are predominantly confined to non-interactive scenarios where models merely showcase garments. This limitation overlooks a crucial aspect of real-world apparel presentation: active human-garment interaction. To bridge this gap, we introduce and formalize a new challenging task: Interactive Video Virtual Try-On (Interactive VVT), where subjects in the video actively engage with their clothing (e.g., pulling a hem or unzipping a jacket). This task introduces unique challenges beyond simple texture preservation, including: (1) resolving the semantic ambiguity of interactions from standard pose information, and (2) learning complex garment deformations from video where interactive moments are sparse and brief. To address these challenges, we propose \textbf{iTryOn}, a novel framework built upon a large-scale video diffusion Transformer. iTryOn pioneers a multi-level interaction injection mechanism to guide the generation of complex dynamics. At the spatial level, we introduce a garment-agnostic 3D hand prior to provide fine-grained guidance for precise hand-garment contact, effectively resolving spatial ambiguity. At the semantic level, iTryOn leverages global captions for overall context and time-stamped action captions for localized interactions, synchronized via our novel Action-aware Rotational Position Embedding (A-RoPE). Furthermore, we design an action-aware constraint loss to stabilize training and focus the learning process on these critical interactive frames. To facilitate research and evaluation, we construct VVT-Interact, the first large-scale dataset for this task. Extensive experiments demonstrate that iTryOn not only achieves state-of-the-art performance on traditional VVT benchmarks but also establishes a commanding lead in the new interactive setting, marking a significant step towards more dynamic and controllable virtual try-on experiences.</p>
        </div>

        <!-- ================== NEW: QUICKVIEW IMAGE ================== -->
        <img src="quickview.png" alt="iTryOn tackles the challenges of Interactive Virtual Try-On, moving beyond traditional non-interactive methods." class="full-width-image" style="margin-top: 40px;">

        <!-- ================== TEASER RESULTS (Multi-item Carousel) ================== -->
        <h2>Interactive Virtual Try-On Results</h2>
        <div id="teaser-carousel" class="carousel-container">
            <div class="slides-wrapper">
                <div class="carousel-slides">
                    <div class="carousel-item"><video src="teaser/018_converted.mp4" autoplay loop muted playsinline></video></div>
                    <div class="carousel-item"><video src="teaser/001_converted.mp4" autoplay loop muted playsinline></video></div>
                    <div class="carousel-item"><video src="teaser/002_converted.mp4" autoplay loop muted playsinline></video></div>
                    <div class="carousel-item"><video src="teaser/003_converted.mp4" autoplay loop muted playsinline></video></div>
                    <div class="carousel-item"><video src="teaser/004_converted.mp4" autoplay loop muted playsinline></video></div>
                    <div class="carousel-item"><video src="teaser/005_converted.mp4" autoplay loop muted playsinline></video></div>
                    <div class="carousel-item"><video src="teaser/006_converted.mp4" autoplay loop muted playsinline></video></div>
                    <div class="carousel-item"><video src="teaser/007_converted.mp4" autoplay loop muted playsinline></video></div>
                    <div class="carousel-item"><video src="teaser/008_converted.mp4" autoplay loop muted playsinline></video></div>
                    <div class="carousel-item"><video src="teaser/009_converted.mp4" autoplay loop muted playsinline></video></div>
                    <div class="carousel-item"><video src="teaser/010_converted.mp4" autoplay loop muted playsinline></video></div>
                    <div class="carousel-item"><video src="teaser/011_converted.mp4" autoplay loop muted playsinline></video></div>
                    <div class="carousel-item"><video src="teaser/012_converted.mp4" autoplay loop muted playsinline></video></div>
                    <div class="carousel-item"><video src="teaser/013_converted.mp4" autoplay loop muted playsinline></video></div>
                    <div class="carousel-item"><video src="teaser/014_converted.mp4" autoplay loop muted playsinline></video></div>
                    <div class="carousel-item"><video src="teaser/015_converted.mp4" autoplay loop muted playsinline></video></div>
                    <div class="carousel-item"><video src="teaser/016_converted.mp4" autoplay loop muted playsinline></video></div>
                    <div class="carousel-item"><video src="teaser/017_converted.mp4" autoplay loop muted playsinline></video></div>
                </div>
            </div>
            <div class="carousel-nav">
                <button class="prev-btn">&#10094;</button>
                <button class="next-btn">&#10095;</button>
            </div>
            <div class="carousel-dots"></div>
        </div>
        
        <!-- ================== NEW: ARCHITECTURE SECTION ================== -->
        <h2>Our Approach: The iTryOn Framework</h2>
        <p style="text-align: center; max-width: 800px; margin-left: auto; margin-right: auto;">
            Our framework, iTryOn, is built upon a Diffusion Transformer backbone. It uniquely incorporates a multi-level guidance mechanism to handle complex human-garment interactions. A 3D hand prior provides fine-grained spatial cues, while time-stamped action captions offer precise semantic control. This dual guidance, combined with an action-aware loss, enables the generation of physically plausible and controllable interactive try-on videos.
        </p>
        <img src="architecture.png" alt="The architecture of the iTryOn framework, showing the DiT backbone, Interaction Guider, and A-RoPE mechanism for temporal cross-attention." class="full-width-image">


        <!-- ================== VVT-INTERACT COMPARISON (Multi-item Carousel) ================== -->
        <h2>Comparison on VVT-Interact Dataset</h2>
        <div id="interact-carousel" class="carousel-container">
            <div class="slides-wrapper">
                <div class="carousel-slides">
                    <div class="carousel-item"><video src="VVT-interact/001_converted.mp4" autoplay loop muted playsinline></video></div>
                    <div class="carousel-item"><video src="VVT-interact/002_converted.mp4" autoplay loop muted playsinline></video></div>
                </div>
            </div>
            <div class="carousel-nav">
                <button class="prev-btn">&#10094;</button>
                <button class="next-btn">&#10095;</button>
            </div>
            <div class="carousel-dots"></div>
        </div>

        <!-- ================== VIVID COMPARISON (Multi-item Carousel) ================== -->
        <h2>Comparison on ViViD Dataset</h2>
         <div id="vivid-carousel" class="carousel-container">
            <div class="slides-wrapper">
                <div class="carousel-slides">
                    <div class="carousel-item"><video src="ViViD/001_converted.mp4" autoplay loop muted playsinline></video></div>
                    <div class="carousel-item"><video src="ViViD/002_converted.mp4" autoplay loop muted playsinline></video></div>
                </div>
            </div>
            <div class="carousel-nav">
                <button class="prev-btn">&#10094;</button>
                <button class="next-btn">&#10095;</button>
            </div>
            <div class="carousel-dots"></div>
        </div>

        <!-- ================== ABLATION STUDY (Carousel) ================== -->
        <h2>Ablation Study</h2>
        <p>We visualize the impact of each component. Simply adding data is insufficient. Spatial guidance enables physical contact, and semantic guidance provides the correct intent, which are both crucial for high-fidelity interactive try-on.</p>
         <div id="ablation-carousel" class="carousel-container">
            <div class="slides-wrapper">
                <div class="carousel-slides">
                    <div class="carousel-item"><video src="ablation/001_converted.mp4" autoplay loop muted playsinline></video></div>
                    <div class="carousel-item"><video src="ablation/002_converted.mp4" autoplay loop muted playsinline></video></div>
                </div>
            </div>
            <div class="carousel-nav">
                <button class="prev-btn">&#10094;</button>
                <button class="next-btn">&#10095;</button>
            </div>
            <div class="carousel-dots"></div>
        </div>
        
    </div>

    <script>
    document.addEventListener('DOMContentLoaded', function() {
        function initMultiCarousel(carouselId, options = {}) {
            const carousel = document.getElementById(carouselId);
            if (!carousel) return;

            const settings = {
                itemsPerPage: options.itemsPerPage || 3,
                itemsPerPageMobile: options.itemsPerPageMobile || 1,
                mobileBreakpoint: 768
            };

            const slidesContainer = carousel.querySelector('.carousel-slides');
            const items = carousel.querySelectorAll('.carousel-item');
            const prevBtn = carousel.querySelector('.prev-btn');
            const nextBtn = carousel.querySelector('.next-btn');
            const dotsContainer = carousel.querySelector('.carousel-dots');
            
            if (items.length === 0) return; // Exit if no items
            
            let currentIndex = 0;
            let currentItemsPerPage = window.innerWidth < settings.mobileBreakpoint ? settings.itemsPerPageMobile : settings.itemsPerPage;
            let totalPages = Math.ceil(items.length / currentItemsPerPage);

            function setupCarousel() {
                currentItemsPerPage = window.innerWidth < settings.mobileBreakpoint ? settings.itemsPerPageMobile : settings.itemsPerPage;
                totalPages = Math.ceil(items.length / currentItemsPerPage);
                
                items.forEach(item => {
                    item.style.flexBasis = `calc(100% / ${currentItemsPerPage})`;
                });

                dotsContainer.innerHTML = '';
                if (totalPages > 1) {
                    for (let i = 0; i < totalPages; i++) {
                        const dot = document.createElement('span');
                        dot.classList.add('dot');
                        dot.dataset.index = i;
                        dotsContainer.appendChild(dot);
                    }
                    
                    dotsContainer.querySelectorAll('.dot').forEach(dot => {
                        dot.addEventListener('click', (e) => {
                            goToPage(parseInt(e.target.dataset.index));
                        });
                    });
                }
                
                const navButtons = carousel.querySelector('.carousel-nav');
                if (totalPages <= 1) {
                    if (navButtons) navButtons.style.display = 'none';
                    if (dotsContainer) dotsContainer.style.display = 'none';
                } else {
                    if (navButtons) navButtons.style.display = 'flex';
                    if (dotsContainer) dotsContainer.style.display = 'block';
                }

                goToPage(0, false); 
            }

            function goToPage(pageIndex, animate = true) {
                currentIndex = Math.max(0, Math.min(pageIndex, totalPages - 1));
                const offset = -currentIndex * 100;
                
                slidesContainer.style.transition = animate ? 'transform 0.5s ease-in-out' : 'none';
                slidesContainer.style.transform = `translateX(${offset}%)`;

                const dots = dotsContainer.querySelectorAll('.dot');
                if (dots.length > 0) {
                    dots.forEach((dot, index) => {
                        dot.classList.toggle('active', index === currentIndex);
                    });
                }
            }
            
            if(prevBtn && nextBtn) {
                prevBtn.addEventListener('click', () => {
                    goToPage(currentIndex - 1);
                });

                nextBtn.addEventListener('click', () => {
                    goToPage(currentIndex + 1);
                });
            }
            
            let resizeTimer;
            window.addEventListener('resize', () => {
                clearTimeout(resizeTimer);
                resizeTimer = setTimeout(() => {
                    setupCarousel();
                }, 250);
            });

            setupCarousel();
        }

        // --- CONFIGURE YOUR CAROUSELS HERE ---
        initMultiCarousel('teaser-carousel', { itemsPerPage: 2, itemsPerPageMobile: 1 });
        initMultiCarousel('interact-carousel', { itemsPerPage: 1, itemsPerPageMobile: 1 });
        initMultiCarousel('vivid-carousel', { itemsPerPage: 1, itemsPerPageMobile: 1 });
        initMultiCarousel('ablation-carousel', { itemsPerPage: 1, itemsPerPageMobile: 1 });
    });
    </script>

</body>
</html>