<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <title>Project Page</title>
  <link rel="stylesheet" href="css/style.css">
</head>
<body>

<!-- Title & Authors -->
<section id="title-authors" class="card">
  <h1>Motion<span class="abbr-letter" style="color:#4285f4">D</span><span class="abbr-letter" style="color:#ea4335">D</span><span class="abbr-letter" style="color:#fbbc05">M</span>
    : Motion Generation and Understanding via 
    <span class="abbr-letter" style="color:#4285f4">D</span>iscrete 
    <span class="abbr-letter" style="color:#ea4335">D</span>iffusion 
    <span class="abbr-letter" style="color:#fbbc05">M</span>odel</h1>
  <!-- <p class="authors">Author1, Author2, Author3</p> -->
</section>

<!-- Teaser -->
<!-- <section id="teaser" class="card">
  <img src="assets/teaser.png" class="teaser-img">
</section> -->

<!-- Abstract -->
<section id="abstract" class="card">
  <h2>Abstract</h2>
  <p>We present <strong>MotionDDM</strong>, a <strong>diffusion-LLM</strong> framework for bidirectional text-motion understanding and generation. Unlike GPT-style autoregressive approaches that tokenize motion and decode sequentially, MotionDDM performs multi-step parallel denoising, <strong>unifying</strong> Text-to-Motion(T2M), Motion-to-Text(M2T), and text-free Motion-to-Motion(M2M) within a <strong>single model</strong>. This decoding paradigm naturally enables a quality-latency trade-off at inference. On HumanML3D, our method achieves competitive T2M/M2T results against strong baselines. Beside T2M/M2T, we further demonstrate motion completion, prediction, and interpolation under both text-conditioned and text-free settings. We also incorporate Residual VQ (RVQ) as the motion tokenizer to improve quantization fidelity, and adopt <strong>GRPO</strong> within the framework to enhance alignment and controllability. To the best of our knowledge, this is the first work to bring diffusion-LLMs to bidirectional text-motion modeling.</p>
</section>

<!-- Method -->
<section id="method" class="card">
  <h2>MotionDDM</h2>
  <img src="assets/motionddm_architecture.png" class="method-img">
  <p>MotionDDM is a unified framework for bidirectional text-motion generation, inspired by the recent success of <em>discrete diffusion language models (dLLMs)</em>. Instead of sequentially autoregressing tokens, dLLMs apply random masking and iterative denoising, which naturally supports <strong>parallel inference</strong>. This allows the model to refine corrupted sequences in multiple steps, dynamically revise low-confidence predictions, and leverage bidirectional attention for stronger contextual reasoning.</p>
</section>

<!-- Tasks Section -->
<section id="results" class="card">
  <h2>Results</h2>
  <p class="muted">Click tabs to switch tasks. Use the arrows to navigate video examples within each task</p>
  <div id="taskTabs" class="tabs"></div>
  <div id="taskPanels" class="tab-content"></div>
</section>

<!-- Video Modal -->
<div id="videoModal" class="modal" aria-hidden="true">
  <div class="modal-content">
    <button id="modalClose" class="modal-close">✖</button>
    <video id="modalVideo" controls autoplay></video>
    <div id="modalCaption" class="caption"></div>
  </div>
</div>

<!-- Comparison Section -->
<section id="comparison" class="card">
  <h2>Comparison with Other Methods</h2>
  <p class="muted">Click tabs to switch tasks. Use the arrows to navigate video examples within each task</p>
  <div class="tabs" id="comparison-tabs"></div>
  <div id="comparisonPanels" class="tab-content"></div>
</section>

<script src="js/main.js"></script>
</body>
</html>
