<!DOCTYPE html>
<html>
<head>
    <meta charset="utf-8">
    <meta name="description"
          content=".">
    <meta name="keywords"
          content="MVGE">
    <meta name="viewport" content="width=device-width, initial-scale=1">
    <title>MVGE</title>

    <link href="https://fonts.googleapis.com/css?family=Google+Sans|Noto+Sans|Castoro"
          rel="stylesheet">

    <link rel="stylesheet" href="./static/css/bulma.min.css">
    <link rel="stylesheet" href="./static/css/bulma-carousel.min.css">
    <link rel="stylesheet" href="./static/css/bulma-slider.min.css">
    <link rel="stylesheet" href="./static/css/fontawesome.all.min.css">
    <link rel="stylesheet" href="./static/css/twentytwenty.css">

    <link rel="stylesheet"
          href="https://cdn.jsdelivr.net/gh/jpswalsh/academicons@1/css/academicons.min.css">
    <link rel="stylesheet" href="./static/css/index.css">
    <link rel="icon" href="./img/logo.png">


    <script src="./static/js/jquery-3.2.1.min.js"></script>
    <script src="./static/js/jquery.event.move.js"></script>
    <script src="./static/js/jquery.twentytwenty.js"></script>
    <script src="./static/js/bulma-carousel.min.js"></script>
    <script src="./static/js/bulma-slider.min.js"></script>
    <script src="./static/js/fontawesome.all.min.js"></script>
</head>


<!-- Google tag (gtag.js) -->
<script async src="https://www.googletagmanager.com/gtag/js?id=G-LKCDBW8851"></script>
<script>
    window.dataLayer = window.dataLayer || [];

    function gtag() {
        dataLayer.push(arguments);
    }

    gtag('js', new Date());

    gtag('config', 'G-LKCDBW8851');
</script>


<body>


<section class="hero teaser ">
    <div class="hero-body">
        <div class="container is-max-desktop has-text-centered">
            <div class="columns is-centered">
                <div class="column has-text-centered ">
                    <img src="img/logo.png" style="height:170px; margin-bottom: -10px;"></img>
                    <h1 class="title is-1 publication-title has-text-centered" style="font-size: 2.5rem; line-height: 1.3;">
                        Scale-invariant and Temporal-consistent Monocular Video Geometry Estimation
                    </h1>
                </div>
            </div>
        </div>
    </div>
</section>


<section class="hero teaser is-light is-small">
    <div class="hero-body">
        <div class="container" style="text-align: left; ">
            <div id="results-carousel-horizontal" class="carousel results-carousel">

                <video muted autoplay="autoplay" loop="loop" width="100%" preload controls>
                    <source src="./video/v/horses-kids.mp4" type="video/mp4">
                </video>

                
                <video muted autoplay="autoplay" loop="loop" width="100%" preload controls>
                    <source src="./video/v/city-ride.mp4" type="video/mp4">
                </video>

                <video muted autoplay="autoplay" loop="loop" width="100%" preload controls>
                    <source src="./video/v/orchid.mp4" type="video/mp4">
                </video>

                <video muted autoplay="autoplay" loop="loop" width="100%" preload controls>
                    <source src="./video/v/baseball.mp4" type="video/mp4">
                </video>
                
                <video muted autoplay="autoplay" loop="loop" width="100%" preload controls>
                    <source src="./video/v/dog-control.mp4" type="video/mp4">
                </video>

                <video muted autoplay="autoplay" loop="loop" width="100%" preload controls>
                    <source src="./video/v/lions.mp4" type="video/mp4">
                </video>

                <video muted autoplay="autoplay" loop="loop" width="100%" preload controls>
                    <source src="./video/v/luggage.mp4" type="video/mp4">
                </video>


            </div>

            <div class="content has-text-justified" style="margin-top: 15px;">
                <b>MVGE</b> generates temporally consistent and scale-invariant 3D geometry from monocular videos with superior accuracy across extended sequences.
            </div>

        </div>
    </div>
</section>

<script>
    $(window).on('load', function () {
        bulmaCarousel.attach('#results-carousel-horizontal', {
            slidesToScroll: 1,
            slidesToShow: 1,
            loop: true,
            autoplay: false,
        });

        bulmaCarousel.attach('#results-carousel-vertical', {
            slidesToScroll: 1,
            slidesToShow: 1,
            loop: true,
            autoplay: false,
        });

    });
</script>


<br><br><br>


<section class="hero teaser">
    <div class="hero-body">
        <div class="columns is-centered has-text-centered">
            <div class="columns is-centered has-text-centered">
                <div class="column is-three-fifths" style=" margin-top: 30px; margin-bottom: 20px;">
                    <h2 class="title is-3">Abstract</h2>
                    <div class="content has-text-justified">
                        We present MVGE, a novel approach for estimating 3D geometry from extended monocular video sequences, where existing methods struggle to maintain both geometric accuracy and temporal consistency across hundreds of frames. Our approach generates affine-invariant 3D point maps with shared parameters across entire sequences, enabling consistent scale-invariant representations. We introduce three key innovations: viewpoint-invariant geometry aligning multi-perspective points in a unified reference frame; appearance-invariant learning enforcing consistency across exponential timescales; and frequency-modulated positioning enabling extrapolation to sequences vastly exceeding training length. Experiments across diverse datasets demonstrate significant improvements, reducing relative point map error by 24.2% and temporal alignment error by 34.9% on ScanNet compared to state-of-the-art methods. Our approach handles challenging scenarios with complex camera trajectories and lighting variations while efficiently processing extended sequences in a single pass. Code will be publicly released, and we encourage readers to explore the interactive demonstrations in our supplementary materials.
                    </div>
                </div>
            </div>
        </div>
    </div>
</section>
<section class="hero teaser">
    <div class="hero-body">
        <div class="columns is-centered has-text-centered">
            <div class="columns is-centered has-text-centered">
                <div class="column is-three-fifths" style=" margin-top: 30px; margin-bottom: 20px;">
                    <h2 class="title is-3">Framework</h2>
                    <div class="content has-text-justified">                       
                        <strong>Overview of MVGE.</strong>
                        <em>Top-Left:</em> MVGE consists of a ViT backbone that processes video input frames, followed by a temporal decoder with cross-attention and dynamic NTK scaling RoPE, producing scale-invariant point maps.
                        <em>Top-Right:</em> Cross-frame geometric consistency enforced across global and local geometric levels (G<sub>1</sub>, G<sub>2</sub>) to maintain structural coherence across frames.
                        <em>Bottom-Left:</em> RoPE with dynamic NTK scaling applied to extend sequence context, using frequency scaling that adaptively weights dimensions based on scale factor, and train-time sequence stretching that creates a virtual extended sequence to sample positions.
                        <em>Bottom-Right:</em> Hierarchical temporal consistency constraints applied multiple temporal strides (δ = 1, 2, 4, 8) to enforce smooth, consistent point map predictions across time.
                        <div class="hero-body" style="margin-top: 40px; margin-bottom: -30px;">
                            <img id="method" width="90%" src="./img/method.png"/>
                        </div>
                    </div>
                </div>
            </div>
        </div>
    </div>
</section>

<section class="hero teaser is-light is-small">
    <div class="hero-body">

        <div class="container" style="text-align: center; ">
            <h2 class="title is-3">Comparison with VGGT on Open-World Videos</h2>
            <div id="comparison-VGGT" class="carousel results-carousel">
                
                <div class="twoitem">
                    <video muted autoplay="autoplay" loop="loop" width="100%">
                        <source src="video/compare/vggt/sea-turtle_rgb.mp4" type="video/mp4">
                    </video>
                    <div class="twentytwenty-container twentytwenty-container-bottom">
                        <div class="cmpcontent">
                            <video muted autoplay="autoplay" loop="loop" width="100%">
                                <source src="video/compare/vggt/sea-turtle_vggt.mp4" type="video/mp4">
                            </video>
                        </div>
                        <div class="cmpcontent">
                            <video muted autoplay="autoplay" loop="loop" width="100%">
                                <source src="video/compare/vggt/sea-turtle_ours.mp4" type="video/mp4">
                            </video>
                        </div>
                    </div>
                </div>

                <div class="twoitem">
                    <video muted autoplay="autoplay" loop="loop" width="100%">
                        <source src="video/compare/vggt/flamingo_rgb.mp4" type="video/mp4">
                    </video>
                    <div class="twentytwenty-container twentytwenty-container-bottom">
                        <div class="cmpcontent">
                            <video muted autoplay="autoplay" loop="loop" width="100%">
                                <source src="video/compare/vggt/flamingo_vggt.mp4" type="video/mp4">
                            </video>
                        </div>
                        <div class="cmpcontent">
                            <video muted autoplay="autoplay" loop="loop" width="100%">
                                <source src="video/compare/vggt/flamingo_ours.mp4" type="video/mp4">
                            </video>
                        </div>
                    </div>
                </div>

                <div class="twoitem">
                    <video muted autoplay="autoplay" loop="loop" width="100%">
                        <source src="video/compare/vggt/motorbike-indoors_rgb.mp4" type="video/mp4">
                    </video>
                    <div class="twentytwenty-container twentytwenty-container-bottom">
                        <div class="cmpcontent">
                            <video muted autoplay="autoplay" loop="loop" width="100%">
                                <source src="video/compare/vggt/motorbike-indoors_vggt.mp4" type="video/mp4">
                            </video>
                        </div>
                        <div class="cmpcontent">
                            <video muted autoplay="autoplay" loop="loop" width="100%">
                                <source src="video/compare/vggt/motorbike-indoors_ours.mp4" type="video/mp4">
                            </video>
                        </div>
                    </div>
                </div>

                <div class="twoitem">
                    <video muted autoplay="autoplay" loop="loop" width="100%">
                        <source src="video/compare/vggt/robotic-arm_rgb.mp4" type="video/mp4">
                    </video>
                    <div class="twentytwenty-container twentytwenty-container-bottom">
                        <div class="cmpcontent">
                            <video muted autoplay="autoplay" loop="loop" width="100%">
                                <source src="video/compare/vggt/robotic-arm_vggt.mp4" type="video/mp4">
                            </video>
                        </div>
                        <div class="cmpcontent">
                            <video muted autoplay="autoplay" loop="loop" width="100%">
                                <source src="video/compare/vggt/robotic-arm_ours.mp4" type="video/mp4">
                            </video>
                        </div>
                    </div>
                </div>
                
                <div class="twoitem">
                    <video muted autoplay="autoplay" loop="loop" width="100%">
                        <source src="video/compare/vggt/music-band_rgb.mp4" type="video/mp4">
                    </video>
                    <div class="twentytwenty-container twentytwenty-container-bottom">
                        <div class="cmpcontent">
                            <video muted autoplay="autoplay" loop="loop" width="100%">
                                <source src="video/compare/vggt/music-band_vggt.mp4" type="video/mp4">
                            </video>
                        </div>
                        <div class="cmpcontent">
                            <video muted autoplay="autoplay" loop="loop" width="100%">
                                <source src="video/compare/vggt/music-band_ours.mp4" type="video/mp4">
                            </video>
                        </div>
                    </div>
                </div>

                <div class="twoitem">
                    <video muted autoplay="autoplay" loop="loop" width="100%">
                        <source src="video/compare/vggt/twist-dance_rgb.mp4" type="video/mp4">
                    </video>
                    <div class="twentytwenty-container twentytwenty-container-bottom">
                        <div class="cmpcontent">
                            <video muted autoplay="autoplay" loop="loop" width="100%">
                                <source src="video/compare/vggt/twist-dance_vggt.mp4" type="video/mp4">
                            </video>
                        </div>
                        <div class="cmpcontent">
                            <video muted autoplay="autoplay" loop="loop" width="100%">
                                <source src="video/compare/vggt/twist-dance_ours.mp4" type="video/mp4">
                            </video>
                        </div>
                    </div>
                </div>

            </div>
        </div>
    </div>
</section>


<script>
    $(window).on('load', function () {
        bulmaCarousel.attach('#comparison-VGGT', {
            slidesToScroll: 2,
            slidesToShow: 2,
            loop: true,
            autoplay: false,
        });

        $(".twentytwenty-container-VGGT").twentytwenty({
            before_label: 'VGGT',
            after_label: 'Ours',
            default_offset_pct: 0.5,
            no_overlay: false,
            //move_slider_on_hover: true,
        });

    });
</script>

<section class="hero teaser is-light is-small">
    <div class="hero-body">

        <div class="container" style="text-align: center; ">
            <h2 class="title is-3">Comparison with MoGe on Open-World Videos</h2>
            <div id="comparison-base" class="carousel results-carousel">

                <div class="twoitem">
                    <video muted autoplay="autoplay" loop="loop" width="100%">
                        <source src="video/compare/moge/mallard-fly_rgb.mp4" type="video/mp4">
                    </video>
                    <div class="twentytwenty-container twentytwenty-container-bottom">
                        <div class="cmpcontent">
                            <video muted autoplay="autoplay" loop="loop" width="100%">
                                <source src="video/compare/moge/mallard-fly_moge.mp4" type="video/mp4">
                            </video>
                        </div>
                        <div class="cmpcontent">
                            <video muted autoplay="autoplay" loop="loop" width="100%">
                                <source src="video/compare/moge/mallard-fly_ours.mp4" type="video/mp4">
                            </video>
                        </div>
                    </div>
                </div>

                <div class="twoitem">
                    <video muted autoplay="autoplay" loop="loop" width="100%">
                        <source src="video/compare/moge/cat-girl_rgb.mp4" type="video/mp4">
                    </video>
                    <div class="twentytwenty-container twentytwenty-container-bottom">
                        <div class="cmpcontent">
                            <video muted autoplay="autoplay" loop="loop" width="100%">
                                <source src="video/compare/moge/cat-girl_moge.mp4" type="video/mp4">
                            </video>
                        </div>
                        <div class="cmpcontent">
                            <video muted autoplay="autoplay" loop="loop" width="100%">
                                <source src="video/compare/moge/cat-girl_ours.mp4" type="video/mp4">
                            </video>
                        </div>
                    </div>
                </div>

                <div class="twoitem">
                    <video muted autoplay="autoplay" loop="loop" width="100%">
                        <source src="video/compare/moge/dog_rgb.mp4" type="video/mp4">
                    </video>
                    <div class="twentytwenty-container twentytwenty-container-bottom">
                        <div class="cmpcontent">
                            <video muted autoplay="autoplay" loop="loop" width="100%">
                                <source src="video/compare/moge/dog_moge.mp4" type="video/mp4">
                            </video>
                        </div>
                        <div class="cmpcontent">
                            <video muted autoplay="autoplay" loop="loop" width="100%">
                                <source src="video/compare/moge/dog_ours.mp4" type="video/mp4">
                            </video>
                        </div>
                    </div>
                </div>

                <div class="twoitem">
                    <video muted autoplay="autoplay" loop="loop" width="100%">
                        <source src="video/compare/moge/rodeo_rgb.mp4" type="video/mp4">
                    </video>
                    <div class="twentytwenty-container twentytwenty-container-bottom">
                        <div class="cmpcontent">
                            <video muted autoplay="autoplay" loop="loop" width="100%">
                                <source src="video/compare/moge/rodeo_moge.mp4" type="video/mp4">
                            </video>
                        </div>
                        <div class="cmpcontent">
                            <video muted autoplay="autoplay" loop="loop" width="100%">
                                <source src="video/compare/moge/rodeo_ours.mp4" type="video/mp4">
                            </video>
                        </div>
                    </div>
                </div>

                <div class="twoitem">
                    <video muted autoplay="autoplay" loop="loop" width="100%">
                        <source src="video/compare/moge/table-tennis_rgb.mp4" type="video/mp4">
                    </video>
                    <div class="twentytwenty-container twentytwenty-container-bottom">
                        <div class="cmpcontent">
                            <video muted autoplay="autoplay" loop="loop" width="100%">
                                <source src="video/compare/moge/table-tennis_moge.mp4" type="video/mp4">
                            </video>
                        </div>
                        <div class="cmpcontent">
                            <video muted autoplay="autoplay" loop="loop" width="100%">
                                <source src="video/compare/moge/table-tennis_ours.mp4" type="video/mp4">
                            </video>
                        </div>
                    </div>
                </div>

                <div class="twoitem">
                    <video muted autoplay="autoplay" loop="loop" width="100%">
                        <source src="video/compare/moge/longboard_rgb.mp4" type="video/mp4">
                    </video>
                    <div class="twentytwenty-container twentytwenty-container-bottom">
                        <div class="cmpcontent">
                            <video muted autoplay="autoplay" loop="loop" width="100%">
                                <source src="video/compare/moge/longboard_moge.mp4" type="video/mp4">
                            </video>
                        </div>
                        <div class="cmpcontent">
                            <video muted autoplay="autoplay" loop="loop" width="100%">
                                <source src="video/compare/moge/longboard_ours.mp4" type="video/mp4">
                            </video>
                        </div>
                    </div>
                </div>

            </div>
        </div>
    </div>
</section>

<section class="hero teaser is-light is-small">
    <div class="hero-body">
        <div class="container" style="text-align: center; ">
            <h2 class="title is-3">Portrait Video Processing</h2>
            <div id="results-carousel-horizontal" class="carousel results-carousel">
                
                <video muted autoplay="autoplay" loop="loop" width="100%" preload controls>
                    <source src="./video/othersize/19136320-hd_1080_1920_50fps.mp4" type="video/mp4">
                </video>         
                
                <video muted autoplay="autoplay" loop="loop" width="100%" preload controls>
                    <source src="./video/othersize/5385812-hd_1080_1920_25fps.mp4" type="video/mp4">
                </video>   

                <video muted autoplay="autoplay" loop="loop" width="100%" preload controls>
                    <source src="./video/othersize/5534310-hd_1080_1920_30fps.mp4" type="video/mp4">
                </video>   

            </div>

        </div>
    </div>
</section>


<section class="hero teaser is-light is-small">
    <div class="hero-body">
        <div class="container" style="text-align: center; ">
            <h2 class="title is-3">4D Scene Reconstruction</h2>
            <div id="cmp-point-carousel-horizontal" class="carousel results-carousel">
                
                <video muted autoplay="autoplay" loop="loop" width="90%" preload controls>
                    <source src="./video/point/bike-packing.mp4" type="video/mp4">
                </video>
                <video muted autoplay="autoplay" loop="loop" width="90%" preload controls>
                    <source src="./video/point/breakdance.mp4" type="video/mp4">
                </video>
                <video muted autoplay="autoplay" loop="loop" width="90%" preload controls>
                    <source src="./video/point/lindy-hop.mp4" type="video/mp4">
                </video>
                <video muted autoplay="autoplay" loop="loop" width="90%" preload controls>
                    <source src="./video/point/mountain_1.mp4" type="video/mp4">
                </video>
                <video muted autoplay="autoplay" loop="loop" width="90%" preload controls>
                    <source src="./video/point/soccerball.mp4" type="video/mp4">
                </video>
                <video muted autoplay="autoplay" loop="loop" width="90%" preload controls>
                    <source src="./video/point/hike.mp4" type="video/mp4">
                </video>
            </div>


        </div>
    </div>
</section>

<script>
    $(window).on('load', function () {
        bulmaCarousel.attach('#cmp-point-carousel-horizontal', {
            slidesToScroll: 2,
            slidesToShow: 2,
            loop: true,
            autoplay: false,
        });

    });
</script>

<section class="hero teaser is-light is-small">
    <div class="hero-body">
        <div class="container" style="text-align: center; ">
            <h2 class="title is-3">Long-Range Temporal Inference</h2>
            <div id="results-carousel-horizontal" class="carousel results-carousel">
                
                <video muted autoplay="autoplay" loop="loop" width="100%" preload controls>
                    <source src="./video/longvideo/12572912_960_540_30fps_frames.mp4" type="video/mp4">
                </video>
                <video muted autoplay="autoplay" loop="loop" width="100%" preload controls>
                    <source src="./video/longvideo/17778538-sd_960_540_24fps_frames.mp4" type="video/mp4">
                </video>
                <video muted autoplay="autoplay" loop="loop" width="100%" preload controls>
                    <source src="./video/longvideo/12502400_960_540_60fps_frames.mp4" type="video/mp4">
                </video>

                

            </div>

        </div>
    </div>
</section>

<script>
    $(window).on('load', function () {
        bulmaCarousel.attach('#comparison-base', {
            slidesToScroll: 2,
            slidesToShow: 2,
            loop: true,
            autoplay: false,
        });

        $(".twentytwenty-container-bottom").twentytwenty({
            before_label: 'Base',
            after_label: 'Ours',
            default_offset_pct: 0.5,
            no_overlay: false,
            // move_slider_on_hover: true,
        });

    });
</script>


<footer class="footer">
    <div class="hero-body">
        <div class="columns is-two-fifths is-centered">
            <div class="column is-8">
                <div class="content">
                    <p>
                        Website template credit to <a
                            href="https://github.com/nerfies/nerfies.github.io">Nerfies</a>, and is licensed under a <a
                            rel="license"
                            href="http://creativecommons.org/licenses/by-sa/4.0/">Creative
                        Commons Attribution-ShareAlike 4.0 International License</a>.
                    </p>
                </div>
            </div>
        </div>
    </div>
</footer>

</body>
</html>
