<!DOCTYPE html>
<html>
<head>
  <meta charset="utf-8">
  <meta name="viewport" content="width=device-width, initial-scale=1">
  <title>Motion-Agent: A Conversational Framework for Human Motion Generation with LLMs</title>

  <!-- Global site tag (gtag.js) - Google Analytics -->
  <script async src="https://www.googletagmanager.com/gtag/js?id=G-PYVRSFMDRL"></script>
  <script>
    window.dataLayer = window.dataLayer || [];

    function gtag() {
      dataLayer.push(arguments);
    }

    gtag('js', new Date());

    gtag('config', 'G-PYVRSFMDRL');
  </script>

  <link href="https://fonts.googleapis.com/css?family=Google+Sans|Noto+Sans|Castoro"
        rel="stylesheet">

  <link rel="stylesheet" href="./static/css/bulma.min.css">
  <link rel="stylesheet" href="./static/css/bulma-carousel.min.css">
  <link rel="stylesheet" href="./static/css/bulma-slider.min.css">
  <link rel="stylesheet" href="./static/css/fontawesome.all.min.css">
  <link rel="stylesheet"
        href="https://cdn.jsdelivr.net/gh/jpswalsh/academicons@1/css/academicons.min.css">
  <link rel="stylesheet" href="./static/css/index.css">

  <script src="https://ajax.googleapis.com/ajax/libs/jquery/3.5.1/jquery.min.js"></script>
  <script defer src="./static/js/fontawesome.all.min.js"></script>
  <script src="./static/js/bulma-carousel.min.js"></script>
  <script src="./static/js/bulma-slider.min.js"></script>
  <script src="./static/js/index.js"></script>
</head>
<body>


<section class="hero">
  <div class="hero-body">
    <div class="container is-max-desktop">
      <div class="columns is-centered">
        <div class="column has-text-centered">
          <h1 class="title is-1 publication-title">Motion-Agent: A Conversational Framework for Human Motion Generation with LLMs</h1>
        </div>
      </div>
    </div>
  </div>
</section>

<section class="hero teaser">
  <div class="container is-max-desktop">
    
        <!-- <center><h2 class="title is-3">Abstract</h2></center> -->
    <!-- <div class="hero-body has-text-centered">
        <img src = "./images/teaser.png" height="80%"></img><br>

    </div> -->

    <p>
      While previous approaches to 3D human motion generation have achieved notable success, they often rely on extensive training and are limited to specific tasks. 
      To address these challenges, we introduce <b>Motion-Agent</b>, an efficient conversational framework designed for general human motion generation, editing, and understanding. 
      Motion-Agent employs an open-source pre-trained language model to develop a generative agent, <b>MotionLLM</b>, that bridges the gap between motion and text. 
      This is accomplished by encoding and quantizing motions into discrete tokens that align with the language model's vocabulary. 
      With only 1--3% of the model's parameters fine-tuned using adapters, MotionLLM delivers performance on par with diffusion models and other transformer-based methods trained from scratch. 
      By integrating MotionLLM with GPT-4 without additional training, Motion-Agent is able to generate highly complex motion sequences through multi-turn conversations, a capability that previous models have struggled to achieve.
      Motion-Agent supports a wide range of motion-language tasks, offering versatile capabilities for generating and customizing human motion through interactive conversational exchanges.
    </p>
    <br>
    <!-- <div class="content has-text-centered">
      <video id="replay-video"
            autoplay 
             controls
             playsinline
             width="80%">
        <source src="./video/demo.mp4"
                type="video/mp4">
      </video>
    </div> -->
    <div id="video-container">
      <video id="video" muted controls playsinline>
        <source src="./video/fig1/fig1.mp4" type="video/mp4">
      </video>
      <!-- <span style="font-size:12px">* This video contains audio.</span> -->
    </div>
    
  </div>
</section>



<section class="section is-light is-small">
  <div class="container is-max-desktop">
    <center><h2 class="title is-3">Overview of Motion-Agent</h2></center>
    <div class="hero-body">
      <a href="./images/model.png"><img src = "./images/model.png" height="100%"></img></href></a><br>
    </div>
  </div>
    <!--/ Abstract. -->

</section>

<section class="section"></section>
  <div class="container is-max-desktop">

    <center><h2 class="title is-3">Qualitative Results of Motion-Agent</h2></center><br>
    <br>
    <h3 class="title is-4">Generating complex and long motions</h2>
    <br>
    <div class="columns is-centered">
      <div class="column">
        <div id="video-container">
          <center><p><b> "Generate a motion of a person performing a floor exercise in artistic gymnastics, and make it long."</b></p></center>
          <video id="video" muted controls playsinline="40%">
              <source src="./video/fig3/fig3_1.mp4"
                      type="video/mp4">
            </video>
        </div>
      </div>
      <div class="column">
        <div class="content">
          <center><p><b> "Generate another motion that a person is kicked down and then stands up to fight back by slapping and kicking."</b></p></center>
          <video poster="" id="warmup" muted controls playsinline="40%">
              <source src="./video/fig3/fig3_2.mp4"
                      type="video/mp4">
            </video>
        </div>
      </div>
    </div>

    <br>
    <h3 class="title is-4">Advanced reasoning ability</h2>
    <br>
    <div class="columns is-centered">
      <div class="column">
        <div id="video-container">
          <video id="video" muted controls playsinline="40%">
              <source src="./video/fig3/fig3.mp4"
                      type="video/mp4">
            </video>
        </div>
      </div>
    </div>

    <br>
    <h3 class="title is-4">Comparison with other methods</h2>
    <br>
    <center><h3 class="title is-5">Generate a motion where a golfer hits the ball, runs to the hole to check,
      and then celebrates by jumping and waving hands.</h2><center>
    <div class="columns is-centered">
      <div class="column">
        <div id="video-container">
          <center><p><b> Motion-Agent</b></p></center>
          <video id="video" muted controls playsinline="40%">
              <source src="./video/fig4/fig4_MotionAgent.mp4"
                      type="video/mp4">
            </video>
        </div>
      </div>
      <div class="column">
        <div class="content">
          <center><p><b> MotionGPT</b></p></center>
          <video poster="" id="warmup" muted controls playsinline="40%">
              <source src="./video/fig4/fig4_MotionGPT.mp4"
                      type="video/mp4">
            </video>
        </div>
      </div>
      <div class="column">
        <div class="content">
          <center><p><b> MoMask</b></p></center>
          <video poster="" id="warmup" muted controls playsinline="40%">
              <source src="./video/fig4/fig4_MoMask.mp4"
                      type="video/mp4">
            </video>
        </div>
      </div>
    </div>

    <br>
    <h3 class="title is-4">Motion-Agent ablation study</h2>
    <br>
    <center><h3 class="title is-5">A person lies face up to rest and then stands up after a while.</h2><center>
    <div class="columns is-centered">
      <div class="column">
        <div id="video-container">
          <center><p><b> Motion-Agent (with <b>MotionLLM</b>)</b></p></center>
          <video id="video" muted controls playsinline="40%">
              <source src="./video/fig6/fig6_MotionLLM.mp4"
                      type="video/mp4">
            </video>
        </div>
      </div>
      <div class="column">
        <div class="content">
          <center><p><b> Motion-Agent (with <b>MotionGPT</b>)</b></p></center>
          <video poster="" id="warmup" muted controls playsinline="40%">
              <source src="./video/fig6/fig6_MotionGPT.mp4"
                      type="video/mp4">
            </video>
        </div>
      </div>
    </div>

    <br>
    <h3 class="title is-4">Smoothening Transition</h2>
    <br>
    <div class="columns is-centered">
      <div class="column">
        <div id="video-container">
          <center><p><b> Direct Concatenation</b></p></center>
          <video id="video" muted controls playsinline="40%">
              <source src="./video/fig5/fig5_1.mp4"
                      type="video/mp4">
            </video>
        </div>
      </div>
      <div class="column">
        <div class="content">
          <center><p><b> Transitioned by Motion-Agent</b></p></center>
          <video poster="" id="warmup" muted controls playsinline="40%">
              <source src="./video/fig5/fig5_2.mp4"
                      type="video/mp4">
            </video>
        </div>
      </div>
    </div>

    <br>
    <h3 class="title is-4">More Conversation Examples</h2>
    <br>
    <div class="columns is-centered">
      <div class="column">
        <div id="video-container">
          <video id="video" muted controls playsinline="40%">
              <source src="./video/fig7/fig7.mp4"
                      type="video/mp4">
            </video>
        </div>
      </div>
    </div>
    <div class="columns is-centered">
      <div class="column">
        <div id="video-container">
          <video id="video" muted controls playsinline="40%">
              <source src="./video/fig8/fig8.mp4"
                      type="video/mp4">
            </video>
        </div>
      </div>
    </div>
    <div class="columns is-centered">
      <div class="column">
        <div id="video-container">
          <video id="video" muted controls playsinline="40%">
              <source src="./video/fig9/fig9.mp4"
                      type="video/mp4">
            </video>
        </div>
      </div>
    </div>

  </div>
</section>


<section class="section">
  <div class="container is-max-desktop">

    <center><h2 class="title is-3">Qualitative results of MotionLLM</h2></center><br>
    <!-- <p>
      Text-to-motion generation results of MotionLLM.
    </p> -->
    <br>
    <h3 class="title is-4">Text-to-motion</h2>
    <br>
    <div class="columns is-centered">
      <div class="column">
        <div id="video-container">
          <center><p><b> "A person performs a backflip."</b></p></center>
          <video id="video" playsinline autoplay muted loop height="40%">
              <source src="./video/motionllm/backflip.mp4#t=0.01"
                      type="video/mp4">
            </video>
        </div>
      </div>
      <div class="column">
        <div class="content">
          <center><p><b> "A person walks forward, turns around, and walks back the way he came."</b></p></center>
          <video poster="" id="warmup" playsinline autoplay muted loop height="40%">
              <source src="./video/motionllm/walkback.mp4#t=0.01"
                      type="video/mp4">
            </video>
        </div>
      </div>
    <!-- </div> -->

    <!-- <div class="columns is-centered"> -->
      <div class="column">
        <div class="content">
          <center><p><b> "A person is doing rope skipping exercise in the park."</b></p></center>
          <video poster="" id="standing" playsinline autoplay muted loop height="40%">
              <source src="./video/motionllm/ropeskip.mp4#t=0.01"
                      type="video/mp4">
            </video>
        </div>
      </div>
      <div class="column">
        <div class="content">
          <center><p><b> "A man is walking as if to be a zombie."</b></p></center>
          <video poster="" id="push" playsinline autoplay muted loop height="40%">
              <source src="./video/motionllm/zombie.mp4#t=0.01"
                      type="video/mp4">
            </video>
        </div>
      </div>
    </div>

    <h3 class="title is-4">Comparison with SOTA</h2>
    <div class="columns is-centered">
      <div class="column">
        <div class="content">
          <center><p><b> "MotionLLM: A man stands motionless and then take one steps backwards to the left."</b></p></center>
          <video poster="" id="bow" playsinline autoplay muted loop height="40%">
              <source src="./video/motionllm/comparedemo1.mp4"
                      type="video/mp4">
            </video>
        </div>
      </div>
      <div class="column">
        <div class="content">
          <center><p><b> "MoMask: A man stands motionless and then take one steps backwards to the left."</b></p></center>
          <video poster="" id="squat" playsinline autoplay muted loop height="40%">
              <source src="./video/motionllm/compared1momask.mp4"
                      type="video/mp4">
            </video>
        </div>
      </div>
    <!-- </div> -->

    <!-- <div class="columns is-centered"> -->
      <div class="column">
        <div class="content">
          <center><p><b> "MotionLLM: A person jumps and spins in the air 360 degrees counterclockwise."</b></p></center>
          <video poster="" id="standing" playsinline autoplay muted loop height="40%">
              <source src="./video/motionllm/mllm_test_spin.mp4"
                      type="video/mp4">
            </video>
        </div>
      </div>
      <div class="column">
        <div class="content">
          <center><p><b> "MoMask: A person jumps and spins in the air 360 degrees counterclockwise."</b></p></center>
          <video poster="" id="warmup" playsinline autoplay muted loop height="40%">
              <source src="./video/motionllm/momask_test_spin.mp4"
                      type="video/mp4">
            </video>
        </div>
      </div>
    </div>
  

    <h3 class="title is-4">Motion-to-text</h2>
    <div class="columns is-centered">
      <div class="column">
        <div class="content">
          <center><p><b> "MotionLLM: A person walks forward while holding arms out as if to be a zombie"</b></p></center>
          <video poster="" id="bow" playsinline autoplay muted loop height="40%">
              <source src="./video/motionllm/demo_1.mp4"
                      type="video/mp4">
            </video>
        </div>
      </div>
      <div class="column">
        <div class="content">
          <center><p><b> "MotionLLM: The person is walking on a balance beam"</b></p></center>
          <video poster="" id="squat" playsinline autoplay muted loop height="40%">
              <source src="./video/motionllm/demo_2.mp4"
                      type="video/mp4">
            </video>
        </div>
      </div>
    <!-- </div> -->

    <!-- <div class="columns is-centered"> -->
      <div class="column">
        <div class="content">
          <center><p><b> "MotionLLM: A person walks backwards in zig-zag motion"</b></p></center>
          <video poster="" id="standing" playsinline autoplay muted loop height="40%">
              <source src="./video/motionllm/demo_3.mp4"
                      type="video/mp4">
          </video>
        </div>
      </div>
      <div class="column">
        <div class="content">
          <center><p><b> "MotionLLm: A person uses their left hand to open a bottle, drinks from it, then places the bottle back down"</b></p></center>
          <video poster="" id="warmup" playsinline autoplay muted loop height="40%">
              <source src="./video/motionllm/demo_4.mp4"
                      type="video/mp4">
            </video>
        </div>
      </div>
    </div>
    <span style="font-size:12px">* The captions above are generated by MotionLLM.</span>
  </div>
</section>


<script>
      var videoContainer = document.getElementById('video-container');
      var video = document.getElementById('video');

      var videoOffset = videoContainer.offsetTop;

      window.addEventListener('scroll', function() {
        var scrollPosition = window.scrollY || window.pageYOffset;

        if (scrollPosition >= videoOffset) {
          video.play();
        } else {
          video.pause();
        }
      });
    </script>

</body>
</html>
