<!DOCTYPE html>
<html>

<head>
  <meta charset="utf-8">
  <meta name="description" content="UFO-4D: Unposed Feedforward 4D reconstruction from Two Images">
  <meta name="viewport" content="width=device-width, initial-scale=1">
  <title>UFO-4D: Unposed Feedforward 4D reconstruction from Two Images</title>

  <link href="https://fonts.googleapis.com/css?family=Google+Sans|Noto+Sans|Castoro" rel="stylesheet">

  <link rel="stylesheet" href="./static/css/bulma.min.css">
  <link rel="stylesheet" href="./static/css/fontawesome.all.min.css">
  <link rel="stylesheet" href="https://cdn.jsdelivr.net/gh/jpswalsh/academicons@1/css/academicons.min.css">
  <link rel="stylesheet" href="./static/css/index.css">

  <script src="https://ajax.googleapis.com/ajax/libs/jquery/3.5.1/jquery.min.js"></script>
  <script defer src="./static/js/fontawesome.all.min.js"></script>
  <script src="./static/js/index.js"></script>
  
  <link rel="icon" href="data:image/svg+xml,<svg xmlns=%22http://www.w3.org/2000/svg%22 viewBox=%220 0 100 100%22><text y=%22.9em%22 font-size=%2290%22>🎥</text></svg>">
</head>

<body>

  <section class="hero">
    <div class="hero-body">
      <div class="container is-max-desktop">
        <div class="columns is-centered">
          <div class="column has-text-centered">
            <h1 class="title is-1 publication-title">UFO-4D: Unposed Feedforward 4D reconstruction from Two Images</h1>
            <div class="is-size-4 publication-venue">
              ICLR 2026
            </div>
          </div>
        </div>
      </div>
    </div>
  </section>

  <hr />



<section class="section">
    <div class="container is-max-desktop">
      
      <h3 class="title is-3">4D Interpolation</h3>
      <div class="method-comparison-container">
        
        <div class="content has-text-justified">
          <p>
            Here we demonstrate 4D interpolation results across different degrees of camera motion or object motion. Given the set of dynamic 3D Gaussians from our method, we rasterize image, depth, and motion at interpolated time and view.
            Depth and motion are defined at the canonical camera's coordinate. Only moving objects present non-white colors in our motion visualization.
          </p>
        </div>
        
        <br>

        <h4 class="title is-4">Small/medium motion</h4>

        Our model successfully interpolates images, depth, and motion, rendering high-quality outputs from predicted dynamic 3D Gaussians.
        <br> 
        <br>
        <div class="interpolation-grid">
          <div class="method-header">Input pairs</div>
          <div></div>
          <div class="method-header">4D Interpolated image</div>
          <div class="method-header">Interpolated depth</div>
          <div class="method-header">Interpolated motion</div>
          <div class="interpolation-macro" 
               data-degree="small_medium_motion" 
               data-scene="DLG1Wbri9ew-clip43-frame_4_22">
          </div>
          <div class="interpolation-macro" 
               data-degree="small_medium_motion" 
               data-scene="8yFWA2Goxl8-clip5-frame_24_39">
          </div>
          <div class="interpolation-macro" 
               data-degree="small_medium_motion" 
               data-scene="-ExndllifRE-clip3-frame_28_45">
          </div>
          <div class="interpolation-macro" 
               data-degree="small_medium_motion" 
               data-scene="rgbd_bonn_crowd3_1548339961.77666_1548339962.11179">
          </div>
          <div class="interpolation-macro" 
               data-degree="small_medium_motion" 
               data-scene="rgbd_bonn_person_tracking2_1548265922.17945_1548265922.51462">
          </div>
        </div>

        <br>


        <h4 class="title is-4">Large motion</h4>

        Even on scenarios with large object motion and camera motion, our model robustly outputs all estimates from <b> unposed two images</b>.

        <br>
        <br>

        <div class="interpolation-grid">
          <div class="method-header">Input pairs</div>
          <div></div>
          <div class="method-header">4D Interpolated image</div>
          <div class="method-header">Interpolated depth</div>
          <div class="method-header">Interpolated motion</div>

          <div class="interpolation-macro" 
               data-degree="large_motion" 
               data-scene="-GjokU2CXow-clip30-frame_16_38">
          </div>
          <div class="interpolation-macro" 
               data-degree="large_motion" 
               data-scene="MhxqBDWonRw-clip137-frame_21_43">
          </div>
          <div class="interpolation-macro" 
               data-degree="large_motion" 
               data-scene="GPYLDy33y7Q-clip6-frame_100_114">
          </div>
          <div class="interpolation-macro" 
               data-degree="large_motion" 
               data-scene="GRTYzq7noG0-clip1-frame_23_42">
          </div>
        </div>

        <br>
        
        <h4 class="title is-4">Extreme motion</h4>

        In highly challenging cases (such as minial image overlaps or extremely large motion), our model struggles to accurately determine the motion of moving objects.
        However, camera motion estimation and geometry for static region remains still very robust.
        The red color in the depth map denotes disocclusion where no image evidence is provided.

        <br>
        <br>

        <div class="interpolation-grid">
          <div class="method-header">Input pairs</div>
          <div></div>
          <div class="method-header">4D Interpolated image</div>
          <div class="method-header">Interpolated depth</div>
          <div class="method-header">Interpolated motion</div>

          <div class="interpolation-macro" 
               data-degree="extreme_motion" 
               data-scene="3TCSW3fOFJY-clip23-frame_113_150">
          </div>
          <div class="interpolation-macro" 
               data-degree="extreme_motion" 
               data-scene="GRTYzq7noG0-clip103-frame_55_161">
          </div>
        </div>
      </div>
    </div>
  </section>


  <hr />


  <section class="section">
    <div class="container is-max-desktop">
      
      <h3 class="title is-3">Qualitative comparison</h3>
      <div class="method-comparison-container">
        
        <div class="content has-text-justified">
          We compare our method against direct competitors, DynaDUSt3R, ZeroMSF, and St4RTrack, and visualize predicted depth and projected optical flow on Stereo4D, Bonn, and KITTI dataset.
          Our method outperforms competing approaches by estimating more accurate depth and motion, even under challenging conditions such as significant camera rotation and large object motion.
          It effectively disentangles dynamic object motion from camera ego-motion, preserving sharp boundaries and geometric consistency.
        </div>

        <br><br>

        <h4 class="title is-4">Stereo4D dataset</h4>
        <div class="method-comparison-grid">
          <div class="method-header">Input Pair</div>
          <div class="method-header">DynaDUSt3R</div>
          <div class="method-header">ZeroMSF</div>
          <div class="method-header">St4RTrack</div>
          <div class="method-header" style="font-weight: 800; color: #0d6fb0;">UFO-4D (Ours)</div>
          <div class="method-header">Ground truth</div>

          <div class="results-macro" 
               data-dataset="stereo4d" 
               data-seq="MhxqBDWonRw-clip137-frame_21_43">
          </div>
          <div class="results-macro" 
               data-dataset="stereo4d" 
               data-seq="6nL_ifACgfE-clip0-frame_50_72">
          </div>
          <div class="results-macro" 
               data-dataset="stereo4d" 
               data-seq="CLsVoV-9OrI-clip6-frame_5_29">
          </div>
        </div>

        <br><br>

        <h4 class="title is-4">Bonn dataset</h4>
        <div class="method-comparison-grid">
          <div class="method-header">Input Pair</div>
          <div class="method-header">DynaDUSt3R</div>
          <div class="method-header">ZeroMSF</div>
          <div class="method-header">St4RTrack</div>
          <div class="method-header" style="font-weight: 800; color: #0d6fb0;">UFO-4D (Ours)</div>
          <div class="method-header">Ground truth</div>

          <div class="results-macro" 
               data-dataset="bonn" 
               data-seq="rgbd_bonn_balloon2_1548266532.18884_1548266532.52390">
          </div>
          <div class="results-macro" 
               data-dataset="bonn" 
               data-seq="rgbd_bonn_crowd2_1548339893.24032_1548339893.57482">
          </div>
          <div class="results-macro" 
               data-dataset="bonn" 
               data-seq="rgbd_bonn_person_tracking2_1548265922.17945_1548265922.51462">
          </div>
        </div>

        <br><br>

        <h4 class="title is-4">KITTI dataset</h4>
        <div class="method-comparison-grid">
          <div class="method-header">Input Pair</div>
          <div class="method-header">DynaDUSt3R</div>
          <div class="method-header">ZeroMSF</div>
          <div class="method-header">St4RTrack</div>
          <div class="method-header" style="font-weight: 800; color: #0d6fb0;">UFO-4D (Ours)</div>
          <div class="method-header">Ground truth</div>

          <div class="results-macro" 
               data-dataset="kitti" 
               data-seq="000188_10">
          </div>
          <div class="results-macro" 
               data-dataset="kitti" 
               data-seq="000190_10">
          </div>
          <div class="results-macro" 
               data-dataset="kitti" 
               data-seq="000022_10">
          </div>
        </div>

      </div>
    </div>
  </section>



  <footer class="footer">
    <div class="container">
      <div class="columns is-centered">
        <div class="column is-8">
          <div class="content">
            <p>
              This website is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
            </p>
          </div>
        </div>
      </div>
    </div>
  </footer>

</body>
</html>