<!DOCTYPE html>
<html lang="" xml:lang="" xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <!-- Global site tag (gtag.js) - Google Analytics -->
    <script>
      window.dataLayer = window.dataLayer || [];

      function gtag() {
        dataLayer.push(arguments);
      }
      gtag("js", new Date());

      gtag("config", "G-KZEKLLQP31");
    </script>

    <meta charset="utf-8" />
    <meta content="width=device-width, initial-scale=1" name="viewport" />
    <title>Universal Humanoid Motion Representations for Physics-Based Control</title>
    <meta content="Pusle" property="og:title" />
    <meta content="" name="description" property="og:description" />
    <meta name="keywords" content="Humanoid Control" />

    <link rel="stylesheet" href="assets/css/project_stylesheet.css" />
    <link href="../data/misc/favicon.ico" rel="shortcut icon" />
    <link href="../data/misc/favicon_apple.ico" rel="apple-touch-icon" />
    <link
      href="https://fonts.googleapis.com/css?family=Google+Sans|Noto+Sans|Castoro"
      rel="stylesheet"
    />
    <link rel="stylesheet" href="assets/css/fontawesome.all.min.css" />
    <link rel="stylesheet" href="assets/academicons/css/academicons.min.css" />

    <script defer src="assets/js/fontawesome.all.min.js"></script>
    <script src="assets/js/iframeResizer.contentWindow.min.js"></script>
  </head>

  <body>
    <div class="n-header"></div>
    <div class="n-title">
      <h1>Universal Humanoid Motion Representations for Physics-Based Control</h1>
    </div>

    <ul class="authors">
     
    </ul>

    <ul class="authors affiliations ">
         
    </ul>

    <ul class="authors venue ">
            <li>
                
            </li>
    </ul>

    <ul class="authors links ">
    </ul>

    <br>


    <div class="n-article">
    <hr />

         <h2 id="abstract ">
            Abstract
        </h2>
        <p>
         We present a universal motion representation that encompasses a comprehensive range of motor skills for physics-based humanoid control.
          Due to the high-dimensionality of humanoid control as well as the inherent difficulties in reinforcement learning, prior methods have focused on learning skill embeddings for a narrow range of movement styles 
          (e.g. locomotion, game characters) from specialized motion datasets. This limited scope hampers its applicability in complex tasks. Our work closes this gap, 
          significantly increasing the coverage of motion representation space. To achieve this, we first learn a motion imitator that can imitate all of human motion from a large, unstructured motion dataset.
           We then create our motion representation by distilling skills directly from the imitator. This is achieved using an encoder-decoder structure with a variational information bottleneck. 
           Additionally, we jointly learn a prior conditioned on proprioception (humanoid's own pose and velocities) to improve model expressiveness and sampling efficiency for downstream tasks. 
           Sampling from the prior, we can generate long, stable, and diverse human motions. Using this latent space for hierarchical RL, we show that our policies solve tasks using natural and realistic human behavior. 
           We demonstrate the effectiveness of our motion representation by solving generative tasks and motion tracking using VR controllers.

        </p>
<hr />
    <p>
        <ol>

            
            
            <li><a href="#mocap_motion_imitation">MoCap Motion Imitation</a></li>

            <li><a href="#generative_rollout">Random Motion Generation</a></li>
            <ul>
                <li><a href="#generative_rollout_switch">Switching Between Imitation and Generation</a></li>
                <li><a href="#generative_rollout_sample">Random Sampling</a></li>
                <li><a href="#generative_rollout_control">Comparison with SOTA on Motion Generation</a></li>
                <li><a href="#training_vis">Training Visualization</a></li>
            </ul>
            <li><a href="#tracking_tasks">Motion Tracking Downstream Task</a></li>
            <li><a href="#generative_tasks">Generative Downstream Tasks</a></li>
            <ul>
                <li><a href="#terrain">Terrain</a></li>
                <li><a href="#strike">Strike</a></li>
                <li><a href="#reach">Reach</a></li>
                <li><a href="#speed">Speed</a></li>
                
            </ul>
            <li><a href="#compare_vq">Comparison with VQ-latent space</a></li>
            
        </ol>

    </p>
    <hr />


    <h2 id = "mocap_motion_imitation">MoCap Motion Imitation</h2>
    <p> In this section, we visualize the motion imitation result from PHC+ and PULSE (distilled from PHC+) as a sanity check. PHC+ can imitate ALL of its training data as well as recovery from fail-states such as fallen on the ground. 
        PULSE largely inherit these abilities through online distillation.  </p>

      <table
        style="
          width: 100%;
          border: 0px;
          border-spacing: 5px 0px;
          border-collapse: separate;
          margin-right: auto;
          margin-left: auto;
          padding-bottom: 20px;
        "
      >
        <tbody>
          <tr class="block_videos">
            <td>
              <video width="100%" height="auto" muted autoplay loop controls>
                <source
                  src="videos/im/phc+_sm.mp4#t=0.001"
                  type="video/mp4 "
                />
                Your browser does not support the video tag.
              </video>
            </td>
            <td>
              <video width="100%" height="auto" muted autoplay loop controls>
                <source
                  src="videos/im/phc+_test_sm.mp4#t=0.001"
                  type="video/mp4 "
                />
                Your browser does not support the video tag.
              </video>
            </td>
             <td>
              <video width="100%" height="auto" muted autoplay loop controls>
                <source
                  src="videos/im/pulse_im_sm.mp4#t=0.001"
                  type="video/mp4 "
                />
                Your browser does not support the video tag.
              </video>
            </td>
             <td>
              <video width="100%" height="auto" muted autoplay loop controls>
                <source
                  src="videos/im/pulse_im_test_sm.mp4#t=0.001"
                  type="video/mp4 "
                />
                Your browser does not support the video tag.
              </video>
            </td>
          </tr>

          <tr class="block_videos_caption">
            <td>
              <center>PHC+: train data imitation</center>
            </td>
            <td>
              <center>PHC+: test data + fail-state recovery</center>
            </td>
            <td>
              <center>PULSE: train data imitation</center>
            </td>
            <td>
              <center>PULSE: test data + fail-state recovery</center>
            </td>
          </tr>
        </tbody>
      </table>

     <h2 id = "generative_rollout">Random Motion Generation</h2>

     <h3 id = "generative_rollout_switch">Switching Between Imitation and Generation</h3>
    <p>  </p>

      <table
        style="
          width: 100%;
          border: 0px;
          border-spacing: 5px 0px;
          border-collapse: separate;
          margin-right: auto;
          margin-left: auto;
          padding-bottom: 20px;
        "
      >

        <tbody>
            <tr class="block_videos">
                <td>
                <video width="100%" height="auto" muted autoplay loop controls>
                    <source
                    src="videos/generative_tasks/random_rollout_1_sm.mp4#t=0.001"
                    type="video/mp4 "
                    />
                    Your browser does not support the video tag.
                </video>
                </td>

                
                <td width="50%">
                        Here we show that we can dynamically switch between random motion generation and imitation, thanks to the fail-state recovery ability of PULSE. The video on the left shows we begin with imitation, then switch to random motion sampling, and back to imitation. 
                </td>
            </tr>

        </tbody>
      </table>

      <h3 id = "generative_rollout_sample">Random Sampling</h3>
      <p>
        In this section, we visualize 8 humanoids together using noise sampled from the prior. 
        
        We  also show that we can vary the sampled motion styles by changing the variance for sampling. 
        If using a small std (the learned prior usually computes a small variance), the sampled motion is smooth and stable. In this case, the humanoid can sometimes stand still for a long time before starting to move again. 
        If using a bigger variance (e.g. 0.22), the motion become more erratic and energetic, and fall down more. Luckily, the humanoid has the ability to get up by sampling the recovery skill. Notice that this behavior originates from training with PHC+, 
        which has the the ability to recover from fallen states. The getup behavior comes completely from random sampling from the prior. 
      </p>
      
      <table
        style="
          width: 100%;
          border: 0px;
          border-spacing: 5px 0px;
          border-collapse: separate;
          margin-right: auto;
          margin-left: auto;
          padding-bottom: 20px;
        "
      >
        <tbody>
          <tr class="block_videos">
            <td>
              <video width="100%" height="auto" muted autoplay loop controls>
                <source
                  src="videos/generative_tasks/random_prior_group_sm.mp4#t=0.001"
                  type="video/mp4 "
                />
                Your browser does not support the video tag.
              </video>
            </td>
             <td>
              <video width="100%" height="auto" muted autoplay loop controls>
                <source
                  src="videos/generative_tasks/random_1_5_group_sm.mp4#t=0.001"
                  type="video/mp4 "
                />
                Your browser does not support the video tag.
              </video>
          </tr>

          <tr class="block_videos_caption">
            <td>
              <center>Random generation, using a small std.</center>
            </td>
            <td>
              <center>Random generation, using a big std.</center>
            </td>
          </tr>
        </tbody>
      </table>


      <p>
         We can enable inter-human collision and generate human-to-human interactions. 
      </p>

      <table
        style="
          width: 100%;
          border: 0px;
          border-spacing: 5px 0px;
          border-collapse: separate;
          margin-right: auto;
          margin-left: auto;
          padding-bottom: 20px;
        "
      >
        <tbody>
          <tr class="block_videos">
            <td>
              <video width="100%" height="auto" muted autoplay loop controls>
                <source
                  src="videos/generative_tasks/collision_group_smallstd_sm.mp4#t=0.001"
                  type="video/mp4 "
                />
                Your browser does not support the video tag.
              </video>
            </td>
            <td>
              <video width="100%" height="auto" muted autoplay loop controls>
                <source
                  src="videos/generative_tasks/collision_group_bigstd_sm.mp4#t=0.001"
                  type="video/mp4 "
                />
                Your browser does not support the video tag.
              </video>
            </td>
             <td>
          </tr>

          <tr class="block_videos_caption">
            <td>
              <center>Random generation, using a small std.</center>
            </td>
            <td>
              <center>Random generation, using a big std.</center>
            </td>
          </tr>
        </tbody>
      </table>

      <h3 id = "generative_rollout_control">Comparison with SOTA on Motion Generation</h3>

      <p>
        In this section, we compare with SOTA generative and latent space models, comparing with both kinematics-based (HuMoR) and physics-based (ASE and CALM) models. 
        Comparing to HuMoR, our method can generate stable, long-term, and physically plausible motion, while during our experiments, more than 50% of generated motion (out of 200) for HuMoR have implausible motion. Compared to other physics-based latent space, 
        our representation has more coverage and can generate more natural and realistic motion, even though the training data is the same. 
      </p>

       <table
        style="
          width: 100%;
          border: 0px;
          border-spacing: 5px 0px;
          border-collapse: separate;
          margin-right: auto;
          margin-left: auto;
          padding-bottom: 20px;
        "
      >
        <tbody>
          <tr class="block_videos">
            <td>
              <video width="100%" height="auto" muted autoplay loop controls>
                <source
                  src="videos/generative_tasks/rollout_compare_humor_sm.mp4#t=0.001"
                  type="video/mp4 "
                />
                Your browser does not support the video tag.
              </video>
            </td>
            <td>
              <video width="100%" height="auto" muted autoplay loop controls>
                <source
                  src="videos/generative_tasks/rollout_compare_sm.mp4#t=0.001"
                  type="video/mp4 "
                />
                Your browser does not support the video tag.
              </video>
            </td>
             <td>
          </tr>

          <tr class="block_videos_caption">
            <td>
              <center>Comparison with kinematics-based motion latent space, HuMoR. We use the same sequence as the initial state as HuMoR. In 00:45 and 1:55, HuMoR generated implausible motion.</center>
            </td>
            <td>
              <center>Comparison with physics-based motion latent space, ASE and CALM, as well training our model from scratch using RL (without distillation).</center>
            </td>
             
          </tr>
        </tbody>
      </table>


       <h3 id = "training_vis">Training Visualization</h3>
       <table
        style="
          width: 100%;
          border: 0px;
          border-spacing: 5px 0px;
          border-collapse: separate;
          margin-right: auto;
          margin-left: auto;
          padding-bottom: 20px;
        "
      >

        <tbody>
            <tr class="block_videos">
                <td>
                <video width="100%" height="auto" muted autoplay loop controls>
                    <source
                    src="videos/generative_tasks/train_sm.mp4#t=0.001"
                    type="video/mp4 "
                    />
                    Your browser does not support the video tag.
                </video>
                </td>

                
                <td width="50%">
                        Here we visualize sampling behavior from our latent space during training (task: reach and speed). For all downstream tasks, we use the fixed standard deviation of 0.22 during training. We can see that 
                        using our latent space as action space for hierarchical RL, the agent can sample realistic human behavior during training. 
                </td>
            </tr>

        </tbody>
      </table>
 
        <h2 id = "tracking_tasks">Motion Tracking Downstream Task</h2>

<p>
        In this section, showcase the VR controller tracking task, where we track the 6DOF pose of the two hand controllers and headset. This is a challenging task as it requires the policy 
        to perform free-form motion tracking to match the controllers. We show that our latent space has enough coverage of the motor skill from AMASS to solve this task and can be applied to real-world captures. 
        Input is visualized as three red dots. 
        </p>
       <table
        style="
          width: 100%;
          border: 0px;
          border-spacing: 5px 0px;
          border-collapse: separate;
          margin-right: auto;
          margin-left: auto;
          padding-bottom: 20px;
        "
      >
        <tbody>
          <tr class="block_videos">
            <td>
              <video width="100%" height="auto" muted autoplay loop controls>
                <source
                  src="videos/tracking_tasks/vr_real_sm.mp4#t=0.001"
                  type="video/mp4 "
                />
                Your browser does not support the video tag.
              </video>
            </td>
            <td>
              <video width="100%" height="auto" muted autoplay loop controls>
                <source
                  src="videos/tracking_tasks/vr_compare_sm.mp4#t=0.001"
                  type="video/mp4 "
                />
                Your browser does not support the video tag.
              </video>
            </td>
              
          </tr>

          <tr class="block_videos_caption">
            <td>
              <center>Real-world data capture</center>
            </td>
            <td>
              <center>Comparison with other latent space models</center>
            </td>
          </tr>
        </tbody>
      </table>

      <p>
        Here we visualize tracking performance on the synthetic data used to train the tracker using AMASS data. Input is visualized as three red dots. 
      </p>

      <table
        style="
          width: 100%;
          border: 0px;
          border-spacing: 5px 0px;
          border-collapse: separate;
          margin-right: auto;
          margin-left: auto;
          padding-bottom: 20px;
        "
      >
        <tbody>
          <tr class="block_videos">
            <td>
              <video width="100%" height="auto" muted autoplay loop controls>
                <source
                  src="videos/tracking_tasks/vr_train_sm.mp4#t=0.001"
                  type="video/mp4 "
                />
                Your browser does not support the video tag.
              </video>
            </td>
            <td>
              <video width="100%" height="auto" muted autoplay loop controls>
                <source
                  src="videos/tracking_tasks/vr_test_sm.mp4#t=0.001"
                  type="video/mp4 "
                />
                Your browser does not support the video tag.
              </video>
            </td>
          </tr>

          <tr class="block_videos_caption">
            <td>
              <center>AMASS train data</center>
            </td>
            <td>
              <center>AMASS test data</center>
            </td>
             
          </tr>
        </tbody>
      </table>


       
    <h2 id = "tracking_tasks">Generative Downstream Task</h2>

    <p>
        In this section, we show results on applying our method to downstream generative tasks. 
    </p>

       <h3 id = "terrain">Terrain</h3>

       <p>
        On the challenging terrain traversal task, our method is able to demonstrate agile human behavior using only simple trajectory following rewarld (without using any additional adversarial rewards like in PACER). 
        Applying ASE at 30Hz can solve this task somewhat, though the motion can be jerky. CALM could not solve this task due to the lack of a style reward. Training from scratch shows unnatural motion. 
    </p>

        <table
        style="
          width: 100%;
          border: 0px;
          border-spacing: 5px 0px;
          border-collapse: separate;
          margin-right: auto;
          margin-left: auto;
          padding-bottom: 20px;
        "
      >
        <tbody>
          <tr class="block_videos">
            <td>
              <video width="100%" height="auto" muted autoplay loop controls>
                <source
                  src="videos/generative_tasks/terrain_sm.mp4#t=0.001"
                  type="video/mp4 "
                />
                Your browser does not support the video tag.
              </video>
            </td>
            <td>
              <video width="100%" height="auto" muted autoplay loop controls>
                <source
                  src="videos/generative_tasks/terrain_compare_sm.mp4#t=0.001"
                  type="video/mp4 "
                />
                Your browser does not support the video tag.
              </video>
            </td>
          </tr>

          <tr class="block_videos_caption">
            <td>
              <center>Task: Trajectory following and terrain traversal</center>
            </td>
            <td>
              <center>Task: Trajectory following and terrain traversal</center>
            </td>
          </tr>
        </tbody>
      </table>

       <h3 id = "strike">Strike</h3>

        <table
        style="
          width: 100%;
          border: 0px;
          border-spacing: 5px 0px;
          border-collapse: separate;
          margin-right: auto;
          margin-left: auto;
          padding-bottom: 20px;
        "
      >
        <tbody>
          <tr class="block_videos">
            <td>
              <video width="100%" height="auto" muted autoplay loop controls>
                <source
                  src="videos/generative_tasks/strike_sm.mp4#t=0.001"
                  type="video/mp4 "
                />
                Your browser does not support the video tag.
              </video>
            </td>
            <td>
              <video width="100%" height="auto" muted autoplay loop controls>
                <source
                  src="videos/generative_tasks/strike_compare_sm.mp4#t=0.001"
                  type="video/mp4 "
                />
                Your browser does not support the video tag.
              </video>
            </td>
          </tr>

          <tr class="block_videos_caption">
            <td>
              <center>Task: Strike block</center>
            </td>
            <td>
              <center>Task: Strike block</center>
            </td>
          </tr>
        </tbody>
      </table>

      <h3 id = "reach">Reach</h3>
<p> The red dot in the air indicates reach target. </p>
        <table
        style="
          width: 100%;
          border: 0px;
          border-spacing: 5px 0px;
          border-collapse: separate;
          margin-right: auto;
          margin-left: auto;
          padding-bottom: 20px;
        "
      >
        <tbody>
          <tr class="block_videos">
            <td>
              <video width="100%" height="auto" muted autoplay loop controls>
                <source
                  src="videos/generative_tasks/reach_sm.mp4#t=0.001"
                  type="video/mp4 "
                />
                Your browser does not support the video tag.
              </video>
            </td>
            <td>
              <video width="100%" height="auto" muted autoplay loop controls>
                <source
                  src="videos/generative_tasks/reach_compare_sm.mp4#t=0.001"
                  type="video/mp4 "
                />
                Your browser does not support the video tag.
              </video>
            </td>
          </tr>

          <tr class="block_videos_caption">
            <td>
              <center>Task: Reach</center>
            </td>
            <td>
              <center>Task: Reach</center>
            </td>
          </tr>
        </tbody>
      </table>

      <h3 id = "speed">Speed</h3>
      <p> The red block on the ground indicates target speed. </p>
        <table
        style="
          width: 100%;
          border: 0px;
          border-spacing: 5px 0px;
          border-collapse: separate;
          margin-right: auto;
          margin-left: auto;
          padding-bottom: 20px;
        "
      >
        <tbody>
          <tr class="block_videos">
            <td>
              <video width="100%" height="auto" muted autoplay loop controls>
                <source
                  src="videos/generative_tasks/speed_sm.mp4#t=0.001"
                  type="video/mp4 "
                />
                Your browser does not support the video tag.
              </video>
            </td>
            <td>
              <video width="100%" height="auto" muted autoplay loop controls>
                <source
                  src="videos/generative_tasks/speed_compare_sm.mp4#t=0.001"
                  type="video/mp4 "
                />
                Your browser does not support the video tag.
              </video>
            </td>
          </tr>

          <tr class="block_videos_caption">
            <td>
              <center>Task: X-direction speed</center>
            </td>
            <td>
              <center>Task: X-direction speed</center>
            </td>
          </tr>
        </tbody>
      </table>


      <h2 id = "compare_vq">Comparison with VQ-latent space</h2>
       <table
        style="
          width: 100%;
          border: 0px;
          border-spacing: 5px 0px;
          border-collapse: separate;
          margin-right: auto;
          margin-left: auto;
          padding-bottom: 20px;
        "
      >

        <tbody>
            <tr class="block_videos">
                <td>
                <video width="100%" height="auto" muted autoplay loop controls>
                    <source
                    src="videos/im/standing_still_vq_sm.mp4#t=0.001"
                    type="video/mp4 "
                    />
                    Your browser does not support the video tag.
                </video>
                </td>

                
                <td width="50%">
                        Here show our attempt at using a vector quantized (VQ) latent space. While after distillation, the policy can achieve high imitation success rate, it manifest micro jitters to remain standing still. This is a result 
                        of the controller switching between different discrete latent codes rapidly. One can increase the latent space size and number of codes to maybe ameliorate this behavior, but then it could defeat the purpose of using a
                        quantized latent space as the discrete space become more and more expressive and closer to a continuous one. 
                </td>
            </tr>

        </tbody>
      </table>

  </body>
</html>
