<!DOCTYPE html>
<html>
<head>
  <meta charset="utf-8">
  <meta name="viewport" content="width=device-width, initial-scale=1">
  <title>Diff4Splat</title>
  <link rel="stylesheet" href="./static/css/bulma.min.css">
  <link rel="stylesheet" href="./static/css/bulma-carousel.min.css">
  <link rel="stylesheet" href="./static/css/bulma-slider.min.css">
  <link rel="stylesheet" href="./static/css/fontawesome.all.min.css">
  <link rel="stylesheet" href="./static/css/index.css">
  <link rel="icon" href="./static/images/logo.png">
  <script src="https://ajax.googleapis.com/ajax/libs/jquery/3.5.1/jquery.min.js"></script>
  <script defer src="./static/js/fontawesome.all.min.js"></script>
  <script src="./static/js/bulma-carousel.min.js"></script>
  <script src="./static/js/bulma-slider.min.js"></script>
  <script src="./static/js/index.js"></script>
</head>


<style>
  .morphing-text {
      background-image: linear-gradient(to right, 
       #ec12f3 0%, #5f0d80 25%, #2892a0 75%, #50d5e9 100%
      );
      -webkit-background-clip: text;
      background-clip: text;
      color: transparent;
  }
  .center-img {
    display: block;
    margin-left: auto;
    margin-right: auto;
  }
</style>

<body>

  <!-- Section 1: Banner -->
  <section class="hero">
    <div class="hero-body">
      <div class="container is-max-desktop">
        <div class="columns is-centered">
          <div class="column has-text-centered">

            <!-- Title -->
            <h1 class="title is-2 publication-title">
              <span class="morphing-text model-name">🌀 Diff4Splat</span>:
              <span>Controllable 4D Scene Generation with Latent Dynamic Reconstruction Models</span>
            </h1>

            <div class="is-size-5 publication-authors">
              <span class="author-block">ICLR 2026 Submission #1271</span>
            </div>
          </div>
        </div>
      </div>
    </div>
  </section>

  <!-- Section 2: Teaser -->
  <section class="hero teaser">
    <div class="container is-max-desktop">
      <div class="hero-body">
        <div style="text-align: center;">
          <img src="./static/images/teaser.png" width="100%" class="center-img"/>
       
      </div>
        <h2 class="subtitle has-text-centered">
          <span class="morphing-text model-name"><b>Diff4Splat</b></span> <b>is a a unified framework
         directly predicts<br>deformable 3D Gaussian field without test-time optimization.</b>
        </h2>

      </div>
    </div>
  </section>

  <!-- Section 3: Abstract -->
  <section style="margin-top: -5pt;" class="section">
    <div class="container is-max-desktop">

      <!-- Abstract -->
      <div class="columns is-centered has-text-centered">
        <div class="column is-four-fifths">
          <h2 class="title is-3">🧩&nbsp;&nbsp; Abstract &nbsp;&nbsp;🧩</h2>
          <div class="content has-text-justified">
          <p>
          We introduce <span class="morphing-text model-name"><b>Diff4Splat</b></span>, a <b>feed-forward method</b> that synthesizes controllable and <b>explicit 4D</b> scenes from <b>a single image</b>.
          Our approach unifies the generative priors of video diffusion models with geometry and motion constraints learned from large-scale 4D datasets.
          Given a single input image, a camera trajectory, and an optional text prompt, <span class="morphing-text model-name"><b>Diff4Splat</b></span> directly predicts a deformable 3D Gaussian field that encodes appearance, geometry, and motion, all in a single forward pass, without test-time optimization or post-hoc refinement.
          <br>
          At the core of our framework lies a video latent transformer, which augments video diffusion models to jointly capture spatio-temporal dependencies and predict time-varying 3D Gaussian primitives.
          Training is guided by objectives on appearance fidelity, geometric accuracy, and motion consistency, enabling <span class="morphing-text model-name"><b>Diff4Splat</b></span> to synthesize high-quality 4D scenes in 30 seconds.
          <br>
          We demonstrate the effectiveness of <span class="morphing-text model-name"><b>Diff4Splat</b></span> across video generation, novel view synthesis, and geometry extraction, where it matches or surpasses optimization-based methods for dynamic scene synthesis while being significantly more efficient.
          The code and pre-trained model will be released.
          </p>
          </div>
        </div>
      </div>

    </div>
  </section>

  <!-- Section 3: Method -->
  <section class="section">
    <div class="container is-max-desktop">
      <div class="columns is-centered">
        <div class="column is-full-width">
          <h2 class="title is-3 has-text-centered">🔮&nbsp;&nbsp; Method &nbsp;&nbsp;🔮</h2>
          <div class="content has-text-justified">
            <img src="./static/images/network.png" width="100%" class="center-img"/>
            <p>
              The network architecture of <span class="morphing-text model-name"><b>Diff4Splat</b></span>. We present a high-fidelity explicit 4D scene generation method from single images through four key innovations: video diffusion latents processed by our novel Transformer enabling dynamic 3DGS deformation, unified supervision with photometric, geometric, and motion losses, and progressive training for robust geometry and texture.
            </p>
          </div>
        </div>
      </div>
    </div>
  </section>

  <section class="section" id="results">
    <div class="container is-max-desktop">
      <div class="columns is-centered">
        <div class="column is-full-width">

          <h2 class="title is-3 has-text-centered">🎨&nbsp;&nbsp; Results of <span class="morphing-text model-name"><b>Diff4Splat</b></span> &nbsp;&nbsp;🎨</h2>
          <div style="text-align:center">
            <table style="width: 100%">
              <tbody>

                <img src="static/images/diff4splat_quant1.png"  style="width: 100%; height: auto;" class="inserted-image"></img>
                <img src="static/images/diff4splat_quant2.png"  style="width: 100%; height: auto;" class="inserted-image"></img>

                <tr class="prompt-row">
                  <td>Input Image</td>
                  <td><span class="morphing-text model-name">Ours<br> (feed-forward)</span></td>
                  <td>MoSca<br>(test-time optimization)</td>
               </tr>


                <tr class="result-row">
                  <td><img src="./static/images/nvs/imgs/1.png"  style="width: 90%; height: auto;"></img></td>
                  <td><video src="./static/images/nvs/imgs/1_ours.mp4" autoplay loop muted style="width: 90%; height: auto;"></video></td>
                  <td><video src="./static/images/nvs/imgs/01_mosca.mp4" autoplay loop muted style="width: 90%; height: auto;"></video></td>
                </tr>

                <tr class="result-row">
                  <td><img src="./static/images/nvs/imgs/3.png"  style="width: 90%; height: auto;"></img></td>
                  <td><video src="./static/images/nvs/imgs/3_ours.mp4" autoplay loop muted style="width: 90%; height: auto;"></video></td>
                  <td><video src="./static/images/nvs/imgs/03_mosca.mp4" autoplay loop muted style="width: 90%; height: auto;"></video></td>
                </tr>

                <tr class="result-row">
                  <td><img src="./static/images/nvs/imgs/4.png"  style="width: 90%; height: auto;"></img></td>
                  <td><video src="./static/images/nvs/imgs/4_ours.mp4" autoplay loop muted style="width: 90%; height: auto;"></video></td>
                  <td><video src="./static/images/nvs/imgs/04_mosca.mp4" autoplay loop muted style="width: 90%; height: auto;"></video></td>
                </tr>

                <tr class="result-row">
                  <td><img src="./static/images/nvs/imgs/6.png"  style="width: 90%; height: auto;"></img></td>
                  <td><video src="./static/images/nvs/imgs/6_ours.mp4" autoplay loop muted style="width: 90%; height: auto;"></video></td>
                  <td><video src="./static/images/nvs/imgs/06_mosca.mp4" autoplay loop muted style="width: 90%; height: auto;"></video></td>
                </tr>
                <tr class="result-row">
                  <td><img src="./static/images/nvs/imgs/8.png"  style="width: 90%; height: auto;"></img></td>
                  <td><video src="./static/images/nvs/imgs/8_ours.mp4" autoplay loop muted style="width: 90%; height: auto;"></video></td>
                  <td><video src="./static/images/nvs/imgs/08_mosca.mp4" autoplay loop muted style="width: 90%; height: auto;"></video></td>
                </tr>

                <tr class="result-row">
                  <td><img src="./static/images/nvs/imgs/10.png"  style="width: 90%; height: auto;"></img></td>
                  <td><video src="./static/images/nvs/imgs/10_ours.mp4" autoplay loop muted style="width: 90%; height: auto;"></video></td>
                  <td><video src="./static/images/nvs/imgs/10_mosca.mp4" autoplay loop muted style="width: 90%; height: auto;"></video></td>
                </tr>

              </tbody>
            </table>
          </div>
    
        </div>
      </div>
    </div>
  </section>


  <section class="section" id="results">
    <div class="container is-max-desktop">
      <div class="columns is-centered">
        <div class="column is-full-width">

          <h2 class="title is-3 has-text-centered">🔧&nbsp;&nbsp; Ablation Studies of <span class="morphing-text model-name"><b>Diff4Splat</b></span> &nbsp;&nbsp;🔧</h2>
          <div style="text-align:center">
            <table style="width: 100%">
              <tbody>
                <img src="static/images/ablation_loss.png"  style="width: 100%; height: auto;" class="inserted-image"></img>
                <img src="static/images/ablation.png"  style="width: 100%; height: auto;" class="inserted-image"></img>
                <p> 
                    Ablation of Deformation Gaussian Field shows that removing this module results in ghosting artifacts <span style="color: red; font-weight: bold;">the red bounding boxes</span>. 
                </p>

              </tbody>
            </table>
          </div>
    
        </div>
      </div>
    </div>
  </section>




</body>

</html>
