<!DOCTYPE html>
<html>

<head>
    <meta charset="utf-8">
    <!-- Meta tags for social media banners, these should be filled in appropriatly as they are your "business card" -->
    <!-- Replace the content tag with appropriate information -->
    <meta name="description" content="CamI2V: Camera-Controlled Image-to-Video Diffusion Model">
    <meta property="og:title" content="CamI2V: Camera-Controlled Image-to-Video Diffusion Model" />
    <meta property="og:description" content="CamI2V: Camera-Controlled Image-to-Video Diffusion Model" />

    <!-- Keywords for your paper to be indexed by-->
    <meta name="keywords" content="CamI2V, Camera Control, Image-to-Video Generation">
    <meta name="viewport" content="width=device-width, initial-scale=1">

    <title>CamI2V: Camera-Controlled Image-to-Video Diffusion Model</title>
    <link href="https://fonts.googleapis.com/css?family=Google+Sans|Noto+Sans|Castoro" rel="stylesheet">
    <link href="https://fonts.googleapis.com/css2?family=Playfair+Display:ital,wght@1,400&display=swap"
        rel="stylesheet">

    <link rel="stylesheet" href="static/css/bulma.min.css">
    <link rel="stylesheet" href="static/css/bulma-carousel.min.css">
    <link rel="stylesheet" href="static/css/bulma-slider.min.css">
    <link rel="stylesheet" href="static/css/fontawesome.all.min.css">
    <link rel="stylesheet" href="https://cdn.jsdelivr.net/gh/jpswalsh/academicons@1/css/academicons.min.css">
    <link rel="stylesheet" href="static/css/index.css">
    <link rel="stylesheet" href="https://cdn.jsdelivr.net/gh/dreampulse/computer-modern-web-font@master/fonts.css">

    <script src="https://ajax.googleapis.com/ajax/libs/jquery/3.5.1/jquery.min.js"></script>
    <script src="https://documentcloud.adobe.com/view-sdk/main.js"></script>
    <script defer src="static/js/fontawesome.all.min.js"></script>
    <script src="static/js/bulma-carousel.min.js"></script>
    <script src="static/js/bulma-slider.min.js"></script>
    <script src="static/js/index.js"></script>

    <script type="text/javascript" async
        src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_SVG"></script>
    <script type="text/x-mathjax-config">
        MathJax.Hub.Config({
            tex2jax: {
                inlineMath: [['$','$'], ['\\(','\\)']]
            }
        });
    </script>

    <style>
        .video-container {
            display: flex;
            justify-content: center;
            gap: 5px;
        }

        .italic {
            font-family: 'Playfair Display';
            font-style: italic;
        }
    </style>
</head>

<body>
    <!-- title and author -->
    <section class="hero">
        <div class="hero-body">
            <div class="container is-max-desktop">
                <div class="columns is-centered">
                    <div class="column has-text-centered">
                        <h1 class="title is-1 publication-title">
                            CamI2V: Camera-Controlled Image-to-Video Diffusion Model
                        </h1>

                        <div class="is-size-5 publication-authors">
                            <span class="author-block">*Under Review</span>
                        </div>

                    </div>
                </div>
            </div>
        </div>
    </section>


    <section class="section hero">

        <div class="container has-text-centered">
            <h2 class="title is-3">More Visualization</h2>
            <h3 class="subtitle">*Generated by 512x320 model (50k training steps), compatible with input images of arbitary aspect ratio.</h3>

            <!-- Row 1 -->
            <div class="video-container">

                <div>
                    <video autoplay controls muted loop>
                        <source src="static/videos/512/pan_left.mp4" type="video/mp4" />
                    </video>
                    <h2 class="subtitle has-text-centered italic">
                        Pan Left
                    </h2>
                </div>

                <div>
                    <video autoplay controls muted loop>
                        <source src="static/videos/512/pan_right.mp4" type="video/mp4" />
                    </video>
                    <h2 class="subtitle has-text-centered italic">
                        Pan Right
                    </h2>
                </div>

                <div>
                    <video autoplay controls muted loop>
                        <source src="static/videos/512/pan_up.mp4" type="video/mp4" />
                    </video>
                    <h2 class="subtitle has-text-centered italic">
                        Pan Up
                    </h2>
                </div>

                <div>
                    <video autoplay controls muted loop>
                        <source src="static/videos/512/pan_down.mp4" type="video/mp4" />
                    </video>
                    <h2 class="subtitle has-text-centered italic">
                        Pan Down
                    </h2>
                </div>

            </div>

            <br>

            <!-- Row 2 -->
            <div class="video-container">

                <div>
                    <video autoplay controls muted loop>
                        <source src="static/videos/512/look_left.mp4" type="video/mp4" />
                    </video>
                    <h2 class="subtitle has-text-centered italic">
                        Look Left
                    </h2>
                </div>

                <div>
                    <video autoplay controls muted loop>
                        <source src="static/videos/512/look_right.mp4" type="video/mp4" />
                    </video>
                    <h2 class="subtitle has-text-centered italic">
                        Look Right
                    </h2>
                </div>

                <div>
                    <video autoplay controls muted loop>
                        <source src="static/videos/512/orbit_left.mp4" type="video/mp4" />
                    </video>
                    <h2 class="subtitle has-text-centered italic">
                        Orbit Left
                    </h2>
                </div>

                <div>
                    <video autoplay controls muted loop>
                        <source src="static/videos/512/orbit_right.mp4" type="video/mp4" />
                    </video>
                    <h2 class="subtitle has-text-centered italic">
                        Orbit Right
                    </h2>
                </div>

            </div>

            <br>

            <!-- Row 3 -->
            <div class="video-container">

                <div>
                    <video autoplay controls muted loop>
                        <source src="static/videos/512/zoom_in_and_rotate.mp4" type="video/mp4" />
                    </video>
                    <h2 class="subtitle has-text-centered italic">
                        Zoom In & Rotate
                    </h2>
                </div>

                <div>
                    <video autoplay controls muted loop>
                        <source src="static/videos/512/pan_left_and_zoom_in_zoom_out.mp4" type="video/mp4" />
                    </video>
                    <h2 class="subtitle has-text-centered italic">
                        Pan Left & Zoom
                    </h2>
                </div>

                <div>
                    <video autoplay controls muted loop>
                        <source src="static/videos/512/forward_and_backward.mp4" type="video/mp4" />
                    </video>
                    <h2 class="subtitle has-text-centered italic">
                        Forward &#8594; Backward
                    </h2>
                </div>

                <div>
                    <video autoplay controls muted loop>
                        <source src="static/videos/512/walking.mp4" type="video/mp4" />
                    </video>
                    <h2 class="subtitle has-text-centered italic">
                        Walking
                    </h2>
                </div>


            </div>

        </div>

    </section>


    <section class="section hero">

        <div class="container has-text-centered">
            <h2 class="title is-3">Visualization (512x320)</h2>
            <h3 class="subtitle">*Original outputs from 512x320 model, no padding removed.</h3>

            <div class="video-container">

                    <video autoplay controls muted loop>
                        <source src="static/videos/512/zoom_in_6_in_1.mp4" type="video/mp4" />
                    </video>

            </div>

        </div>

    </section>


    <section class="section hero">

        <div class="container has-text-centered">
            <h2 class="title is-3">Visualization (256x256)</h2>

            <div class="video-container">

                <div>
                    <video autoplay controls muted loop>
                        <source src="static/videos/256/orbit_left.mp4" type="video/mp4" />
                    </video>
                    <h2 class="subtitle has-text-centered italic">
                        Orbit Left
                    </h2>
                </div>

                <div>
                    <video autoplay controls muted loop>
                        <source src="static/videos/256/orbit_right.mp4" type="video/mp4" />
                    </video>
                    <h2 class="subtitle has-text-centered italic">
                        Orbit Right
                    </h2>
                </div>

            </div>

            <br>

            <div class="video-container">

                <div>
                    <video autoplay controls muted loop>
                        <source src="static/videos/256/zoom_in.mp4" type="video/mp4" />
                    </video>
                    <h2 class="subtitle has-text-centered italic">
                        Zoom In
                    </h2>
                </div>

                <div>
                    <video autoplay controls muted loop>
                        <source src="static/videos/256/zoom_out.mp4" type="video/mp4" />
                    </video>
                    <h2 class="subtitle has-text-centered italic">
                        Zoom Out
                    </h2>
                </div>

            </div>

        </div>

    </section>


    <section class="section hero">

        <div class="container has-text-centered">
            <h2 class="title is-3">More Ablation</h2>

            <!-- ablation 1 -->
            <div class="video-container">

                <div>
                    <video autoplay controls muted loop>
                        <source src="static/videos/ablation/1/256_cami2v.mp4" type="video/mp4" />
                    </video>
                    <h2 class="subtitle has-text-centered italic">
                        CamI2V (Ours)
                    </h2>
                </div>

                <div>
                    <video autoplay controls muted loop>
                        <source src="static/videos/ablation/1/256_cami2v_3dfull.mp4" type="video/mp4" />
                    </video>
                    <h2 class="subtitle has-text-centered italic">
                        CamI2V - 3D full attention
                    </h2>
                </div>

                <div>
                    <video autoplay controls muted loop>
                        <source src="static/videos/ablation/1/256_camco.mp4" type="video/mp4" />
                    </video>
                    <h2 class="subtitle has-text-centered italic">
                        CamI2V - epipolar attention <br> only on reference frame <br> (similar to CamCo)
                    </h2>
                </div>

                <div>
                    <video autoplay controls muted loop>
                        <source src="static/videos/ablation/1/256_cameractrl.mp4" type="video/mp4" />
                    </video>
                    <h2 class="subtitle has-text-centered italic">
                        CameraCtrl
                    </h2>
                </div>

                <div>
                    <video autoplay controls muted loop>
                        <source src="static/videos/ablation/1/256_motionctrl.mp4" type="video/mp4" />
                    </video>
                    <h2 class="subtitle has-text-centered italic">
                        MotionCtrl
                    </h2>
                </div>

            </div>

            <br>

            <div class="content has-text-justified">
                    Due to the direct cross-frame interactions (epipolar attention or 3D full attention), CamI2V
                    and 3D full attention succeed in panning right with a large camera movement, while CameraCtrl and MotionCtrl
                    fail. However, we can see some blur or color shift in the left of the 3D full attention, this is because
                    3D full attention have access to all the noisy features (noisy condition) across frames, leading to incorrect absorbing in color.
                    Epipolar only on reference frame (CamCo-like) also fails not only because the limited access to noisy condition
                    (the newly appeared pixels have no intersections on the reference frame) but also too much copy of reference image
                    leads to static scene.
            </div>

        </div>

    </section>

</body>

</html>