<!DOCTYPE html>
<html>

<head>
    <script>
    window.dataLayer = window.dataLayer || [];
    </script>

    <meta charset="utf-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0, shrink-to-fit=no">
    <title>VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control</title>
    
    <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/twitter-bootstrap/4.5.0/css/bootstrap.min.css">
    <link href='https://fonts.googleapis.com/css?family=Source+Sans+Pro:300,400,500,600' rel='stylesheet' type='text/css'>
    <link rel="stylesheet" href="data/assets/css/styles.css">

    <link rel="apple-touch-icon" sizes="180x180" href="apple-touch-icon.png">
    <link rel="manifest" href="site.webmanifest">
    <!-- <meta name="robots" content="noindex"> -->

    <meta property="og:site_name" content="VD3D" />
    <meta property="og:type" content="video.other" />
    <meta property="og:title" content="VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control" />
    <meta property="og:description" content="VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control." />
    <meta property="og:url" content="" />

</head>

<body>
    <div class="highlight-clean" style="padding-bottom: 0px; padding-top: 20px;">
        <div class="container" style="max-width: 1024px; margin-bottom: 20px">
            <h1 class="text-center" style="font-size:33px;"><b>VD3D</b>: Taming Large Video Diffusion Transformers for 3D Camera Control</h1>
        </div>
        <div id="container">
        <div class="buttons" style="margin-top: 8px; margin-bottom: 8px;">
            <a class="btn btn-light" role="button" href="main.html">
                <svg style="visibility:hidden;width:0px;height:24px;margin-left:-12px;margin-right:12px" width="0px" height="24px" viewBox="0 0 375 531">
                    <polygon stroke="#000000" points="0.5,0.866 459.5,265.87 0.5,530.874 "></polygon>
                </svg>
                Main Page
            </a>
        </div>
        </div>
    </div>
    <hr class="divider" />
    <div class="container" style="max-width: 768px;">
        <div class="row">
            <div class="col-sm-12">
                <h2>Out-of-Distribution Camera Trajectories</h2>
                <h6>We apply different translations and rotations and their combinations to to the same initial scenes. We demonstrate the ability to handle a large variety of user-defined cameras and directional changes. We use the same seed for all videos and do not cherry-pick any results.</h6>
            </div>
        </div>
        <hr class="divider" />
        <div class="row">
            <div class="col-sm-12">
                <h6>Rotation Around Clockwise</h6>
            </div>
        </div>
        <div class="compositional captioned_videos">
            <video class="video lazy" autoplay loop playsinline muted>
                <source data-src="data/videos/rebuttal/ood/rotate_around_cw.mp4" type="video/mp4"></source>
            </video>
        </div>
        <hr class="divider" />
        <div class="row">
            <div class="col-sm-12">
                <h6>Rotation Around Anticlockwise</h6>
            </div>
        </div>
        <div class="compositional captioned_videos">
            <video class="video lazy" autoplay loop playsinline muted>
                <source data-src="data/videos/rebuttal/ood/rotate_around_acw.mp4" type="video/mp4"></source>
            </video>
        </div>
        <hr class="divider" />
        <div class="row">
            <div class="col-sm-12">
                <h6>Rotation Clockwise (No Translation)</h6>
            </div>
        </div>
        <div class="compositional captioned_videos">
            <video class="video lazy" autoplay loop playsinline muted>
                <source data-src="data/videos/rebuttal/ood/rotate_cw.mp4" type="video/mp4"></source>
            </video>
        </div>
        <hr class="divider" />
        <div class="row">
            <div class="col-sm-12">
                <h6>Rotation Anticlockwise (No Translation)</h6>
            </div>
        </div>
        <div class="compositional captioned_videos">
            <video class="video lazy" autoplay loop playsinline muted>
                <source data-src="data/videos/rebuttal/ood/rotate_acw.mp4" type="video/mp4"></source>
            </video>
        </div>
        <hr class="divider" />
        <div class="row">
            <div class="col-sm-12">
                <h6>Zoom Out, then Up</h6>
            </div>
        </div>
        <div class="compositional captioned_videos">
            <video class="video lazy" autoplay loop playsinline muted>
                <source data-src="data/videos/rebuttal/ood/zoom_out_up.mp4" type="video/mp4"></source>
            </video>
        </div>
        <hr class="divider" />
        <div class="row">
            <div class="col-sm-12">
                <h6>Translation Right, then Rotation Anticlockwise</h6>
            </div>
        </div>
        <div class="compositional captioned_videos">
            <video class="video lazy" autoplay loop playsinline muted>
                <source data-src="data/videos/rebuttal/ood/right_acw.mp4" type="video/mp4"></source>
            </video>
        </div>
        <hr class="divider" />
        <div class="row">
            <div class="col-sm-12">
                <h6>Translation Left, then Rotation Clockwise</h6>
            </div>
        </div>
        <div class="compositional captioned_videos">
            <video class="video lazy" autoplay loop playsinline muted>
                <source data-src="data/videos/rebuttal/ood/left_cw.mp4" type="video/mp4"></source>
            </video>
        </div>
        <hr class="divider" />
        <div class="row">
            <div class="col-sm-12">
                <h6>Translation Left</h6>
            </div>
        </div>
        <div class="compositional captioned_videos">
            <video class="video lazy" autoplay loop playsinline muted>
                <source data-src="data/videos/rebuttal/ood/left.mp4" type="video/mp4"></source>
            </video>
        </div>
        <hr class="divider" />
        <div class="row">
            <div class="col-sm-12">
                <h6>Translation Right</h6>
            </div>
        </div>
        <div class="compositional captioned_videos">
            <video class="video lazy" autoplay loop playsinline muted>
                <source data-src="data/videos/rebuttal/ood/right.mp4" type="video/mp4"></source>
            </video>
        </div>
        <hr class="divider" />
        <div class="row">
            <div class="col-sm-12">
                <h6>Translation Up</h6>
            </div>
        </div>
        <div class="compositional captioned_videos">
            <video class="video lazy" autoplay loop playsinline muted>
                <source data-src="data/videos/rebuttal/ood/up.mp4" type="video/mp4"></source>
            </video>
        </div>
        <hr class="divider" />
        <div class="row">
            <div class="col-sm-12">
                <h6>Translation Down</h6>
            </div>
        </div>
        <div class="compositional captioned_videos">
            <video class="video lazy" autoplay loop playsinline muted>
                <source data-src="data/videos/rebuttal/ood/down.mp4" type="video/mp4"></source>
            </video>
        </div>
    </div>
    <hr class="divider" />
    <div class="container" style="max-width: 768px;">
        <div class="row">
            <div class="col-sm-12">
                <h2>Vanilla DiT Results</h2>
                <h6>We use a pre-trained vanilla DiT model in the latent space of CogVideoX and fine-tune it for camera control with our mechanism. Our approach generalizes to other transformer architectures and pipelines.</h6>
            </div>
        </div>
        <hr class="divider" />
        <div class="row">
            <div class="col-sm-12">
                <h6>Rotation Around Antilockwise</h6>
            </div>
        </div>
        <div class="compositional captioned_videos">
            <video class="video lazy" autoplay loop playsinline muted>
                <source data-src="data/videos/rebuttal/dit/rotate_around_acw.mp4" type="video/mp4"></source>
            </video>
        </div>
        <hr class="divider" />
        <div class="row">
            <div class="col-sm-12">
                <h6>Zoom Out, then Up</h6>
            </div>
        </div>
        <div class="compositional captioned_videos">
            <video class="video lazy" autoplay loop playsinline muted>
                <source data-src="data/videos/rebuttal/dit/zoom_out_up.mp4" type="video/mp4"></source>
            </video>
        </div>
    </div>
    <hr class="divider" />
    <script src="data/assets/js/yall.js"></script>
    <script>
        yall(
            {
                observeChanges: true
            }
        );
    </script>
</body>

</html>
