<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Model Video Visualizations</title>
    <style>
        * { margin: 0; padding: 0; box-sizing: border-box; }
        
        body { 
            font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif;
            background: linear-gradient(135deg, #f5f7fa 0%, #c3cfe2 100%);
            min-height: 100vh;
            display: flex;
            justify-content: center;
            padding: 40px 20px;
        }
        
        .content { 
            max-width: 900px; 
            width: 100%; 
            background: white;
            border-radius: 20px;
            box-shadow: 0 20px 60px rgba(0, 0, 0, 0.3);
            padding: 40px;
        }
        
        h1 {
            color: #2d3748;
            font-size: 2.5em;
            margin-bottom: 20px;
            text-align: center;
            background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
            -webkit-background-clip: text;
            -webkit-text-fill-color: transparent;
            background-clip: text;
        }
        
        h2 { 
            color: #4a5568;
            margin-top: 40px;
            margin-bottom: 15px;
            padding-bottom: 10px;
            border-bottom: 3px solid #667eea;
            font-size: 1.8em;
        }
        
        p {
            color: #4a5568;
            line-height: 1.6;
            margin-bottom: 15px;
        }
        
        p strong {
            color: #667eea;
            font-weight: 600;
        }
        
        .note {
            background: #fff3cd;
            border-left: 4px solid #ffc107;
            padding: 15px 20px;
            border-radius: 8px;
            margin-bottom: 30px;
            color: #856404;
        }
        
        .note strong {
            color: #856404;
        }
        
        .video-container { 
            display: flex; 
            flex-direction: column; 
            gap: 25px;
            margin-top: 20px;
        }
        
        .video-container.side-by-side { 
            flex-direction: row; 
            gap: 20px; 
        }
        
        .video-item { 
            width: 100%; 
            max-width: 800px; 
            margin: 0 auto;
            background: #f7fafc;
            border-radius: 12px;
            padding: 15px;
            box-shadow: 0 4px 6px rgba(0, 0, 0, 0.1);
            transition: transform 0.3s ease, box-shadow 0.3s ease;
        }
        
        .video-item:hover {
            transform: translateY(-5px);
            box-shadow: 0 12px 20px rgba(102, 126, 234, 0.2);
        }
        
        .side-by-side .video-item { 
            width: 48%; 
            max-width: none; 
        }
        
        video {
            border-radius: 8px;
            background: #000;
        }
        
        .video-item p {
            margin-top: 10px;
            font-size: 0.9em;
            color: #718096;
            text-align: center;
            font-family: 'Courier New', monospace;
        }
        
        @media (max-width: 768px) {
            .content { padding: 20px; }
            h1 { font-size: 1.8em; }
            h2 { font-size: 1.4em; }
            .video-container.side-by-side { flex-direction: column; }
            .side-by-side .video-item { width: 100%; }
        }
    </style>
</head>
<body>
    <div class="content">
        <h1>Model Output Visualizations</h1>
        <p>This page visualizes the outputs of our model alongside competing baselines. We evaluate performance across three motion categories: intensive, moderate, and subtle motion.</p>
        <p>Our keyframe-aware video generation model <strong>not only captures rapid, high-energy motion driven by strong audio cues</strong>, but also handles subtle or low-variation ones, even <strong>when the audio signal becomes weak or momentarily pauses</strong>.</p>
        <p class="note"><strong>Note:</strong> If the videos are not shown properly, please try extracting the zip file and open this webpage from the unzipped folder.</p>
        
        <h2>1. Intensive Motion</h2>
        <p>Examples showcasing intensive motion generation, like hammering or machine gun shooting.</p>
        <p>The videos are arranged from left to right as follows: KeyVID, KeyVID-Uniform, AVSyncD, and DynamiCrafter, and new ablation study without keyframe index embedding.</p>
        <div class="video-container" id="avsyncd-container"></div>
        
        <h2>2. Moderate Motion</h2>
        <p>Examples showcasing moderate motion generation, like frog croaking or lion roaring.</p>
        <p>The videos are arranged from left to right as follows: KeyVID, KeyVID-Uniform, AVSyncD, and DynamiCrafter, and new ablation study without keyframe index embedding.</p>
        <div class="video-container" id="moderate-motion-container"></div>
        
        <h2>3. Subtle Motion</h2>
        <p>Examples showcasing subtle motion generation in music performance scenarios.</p>
        <p>The videos are arranged from left to right as follows: KeyVID, KeyVID-Uniform, AVSyncD, and DynamiCrafter, and new ablation study without keyframe index embedding.</p>
        <div class="video-container" id="subtle-motion-container"></div>
        
        <h2>4. Failure Cases</h2>
        <p>Failure cases according to human evaluation results, in which less than 50% of participants agreed that KeyVID outperformed the competing baselines.</p>
        <p>The videos are arranged from left to right as follows: KeyVID, KeyVID-Uniform, AVSyncD, and DynamiCrafter.</p>
        <p>Although our model generally provides stronger audio–motion alignment, it can occasionally produce <strong>physically implausible predictions</strong> (e.g., body shaking artifacts when playing the trombone).</p>   
        <p>In contrast, the baseline, which relies on uniformly sampled frames, tends to generate <strong>smoother motion</strong> that may appear more natural in these specific cases, but it in fact exhibits <strong>weaker audio–motion synchronization</strong>. This weakness becomes even more pronounced in other examples within the same subtle-motion category, where the baseline often misses important audio-driven motion cues.</p>
        <div class="video-container" id="failure-case-container"></div>
        
        <h2>5. Open-Domain Generation Visualization with Audio Synchronization</h2>
        <p>The first audio clip sounds like a hammer <strong>striking on a wooden surface</strong>, and the second represents four hammer strikes on a <strong>metal object</strong>.</p>
        <p>The results show that our model not only generates videos with the <strong>correct pattern of hammer strikes</strong> but also <strong>hits on different objects</strong> based on the material sound.</p>
        <div class="video-container side-by-side" id="open-domain-container"></div>
    </div>
    
    <script>
        function loadVideos(containerId, folderPath, fileNames) {
            let container = document.getElementById(containerId);
            fileNames.forEach(file => {
                let videoDiv = document.createElement("div");
                videoDiv.classList.add("video-item");
                videoDiv.innerHTML = `
                    <video width="100%" controls>
                        <source src="${folderPath}/${file}" type="video/mp4">
                        Your browser does not support the video tag.
                    </video>
                    <p>${file}</p>
                `;
                container.appendChild(videoDiv);
            });
        }

        // Manually list the video file names here
        const avsyncdVideos = ["dog_barking/_52ntwwQyv4_000070_000080_3.5_9.5_clip-02.mp4", 
                                "hammering/_tzXSoaZ644_000021_000031_0.0_3.0_clip-01.mp4",
                                "dog_barking/3qesirWAGt4_000020_000030_0.0_8.0_clip-00.mp4",
                                "hammering/lm8M8aEoa3c_000081_000091_3.5_8.0_clip-01.mp4",
                                "machine_gun_shooting/yUZMpGwS-OI_000230_000240_2.5_6.5_clip-02.mp4",
                                "machine_gun_shooting/sxJjC9HC1Xs_000310_000320_0.5_10.0_clip-02.mp4"];
        const moderateMotionVideos = ["frog_croaking/C7VWBi27oGc_000006_000016_1.0_5.0_clip-02.mp4",
                                    "lions_roaring/Gwlez841U_I_000007_000017_0.0_3.0_clip-00.mp4",
                                    "cap_gun_shooting/03fGTwkSBWs_000289_000299_0.0_5.0_clip-00.mp4",
                                    "lions_roaring/57Q_RHO0qA8_000000_000010_1.5_4.5_clip-01.mp4",
                                    // "frog_croaking/9rWoPW0VU00_000165_000175_0.0_3.0_clip-02.mp4",
                                    "frog_croaking/Il9qAhbbeBw_000013_000023_3.0_7.0_clip-00.mp4",
                                    "chicken_crowing/2baznhAyEsg_000014_000024_5.5_8.5_clip-01.mp4"];
        const subtleMotionVideos = ["playing_trombone/EnFYztu2dBc_000002_000012_0.0_10.0_clip-02.mp4",
                                     "playing_cello/czaj1HwZFYk_000180_000190_0.0_10.0_clip-01.mp4",
                                     "playing_trumpet/3cThgRIaqgU_000016_000026_0.5_9.0_clip-01.mp4",
                                     "playing_trumpet/r1JpF0ovMFA_000034_000044_2.5_6.0_clip-01.mp4",
                                     "playing_violin__fiddle/-j9x-d4ZqtY_000030_000040_0.0_9.5_clip-01.mp4",
                                     "playing_violin__fiddle/O8CrIlFXN1I_000030_000040_3.0_10.0_clip-01.mp4"];
        const failureCaseVideos = ["playing_violin__fiddle/74BDV0z_bvw_000030_000040_0.5_9.5_clip-00.mp4",
                                    "playing_trumpet/0bC2T-xZkCs_000187_000197_0.0_3.0_clip-00.mp4",
                                    "chicken_crowing/2OLXoKxJ1qg_000030_000040_0.0_4.5_clip-01.mp4"];

        
        const openDomainVideos = ["open_wooden.mp4", "open_metal.mp4"];
        
        loadVideos("avsyncd-container", "videos/avsync15", avsyncdVideos);
        loadVideos("moderate-motion-container", "videos/avsync15", moderateMotionVideos);
        loadVideos("subtle-motion-container", "videos/avsync15", subtleMotionVideos);
        loadVideos("failure-case-container", "videos/avsync15", failureCaseVideos);
        loadVideos("open-domain-container", "videos/open_domain", openDomainVideos);
    </script>
</body>
</html>