<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>HumanNorm</title>
<link href="./js/style.css" rel="stylesheet">
<script type="text/javascript" src="./js/jquery.mlens-1.0.min.js"></script>
<script type="text/javascript" src="./js/jquery.js"></script>


<style>
  p.serif{
    font-family:"Times New Roman", Times, serif;
  }
  p.sansserif{
    font-family: Arial, Helvetica, sans-serif;
  }
  .text-center {
    text-align: center;
}
</style>
  
</head>

<body>
<div class="content">
  <h1><strong>HumanNorm: Learning Normal Diffusion Model for High-quality and Realistic 3D Human Generation</strong></h1>
  <p id="authors" class="serif">
    <br>
    <a>Anonymous authors</sup></a>
    <br>
    <br>
    <a>Under review at ICLR 2024</sup></a>
  </p>

  <div class="row">
    <div class="col-full">
      <video  width="100%" loop autoplay muted>
        <source src="results/teaser/teaser.mp4" type="video/mp4">
      </video>
    </div>
  </div>

  <h3 style="text-align:center"><em>Given a prompt, our method is capable of generating a high-quality and realistic 3D human...</em></h3>

</div>

<div class="content">
  <p style="text-align:center; font-size: 2em;" class="serif">Abstract</p>
  <p>Recent text-to-3D methods have marked significant progress in 3D human generation. However, these methods struggle with high-quality generation, resulting in smooth geometry and cartoon-like appearances. In this paper, we found that by fine-tuning the text-to-image diffusion model with normal maps, it can be adapted to a text-to-normal diffusion model, while preserving part of the generation priors learned from large-scale datasets. Therefore, we propose HumanNorm, a novel approach for high-quality and realistic 3D human generation by integrating normal maps into diffusion models. We employ two integration strategies and propose a normal-adapted diffusion model as well as a normal-aligned diffusion model. The normal-adapted diffusion model can generate high-fidelity normal maps corresponding to prompts with view-dependent text. The normal-aligned diffusion model learns to generate color images aligned with the normal maps, thereby transforming physical geometry details into realistic appearance. Leveraging the proposed normal diffusion model, we devise a progressive geometry generation strategy and coarse-to-fine texture generation strategy to enhance the efficiency and robustness of 3D human generation. Comprehensive experiments substantiate our method's ability to generate 3D humans with intricate geometry and realistic appearances, significantly outperforming existing text-to-3D methods in both geometry and texture quality.</p>
</div>


<div class="content">
  <p style="text-align:left; font-size: 2em; font-weight: bold" class="serif">Methodology</p>
  <p style="font-size: 1.2em" class="serif">
    Our method is designed for high-quality and realistic 3D human generation from given prompts. The whole framework consists of geometry and texture generation. We first propose the normal-adapted and depth-adapted diffusion model for the geometry generation. These two models can guide the rendered normal and depth maps to approach the learned distribution of high-fidelity normal and depth maps through the SDS loss, thereby achieving high-quality geometry generation. In terms of texture generation, we introduce the normal-aligned diffusion model and employ a coarse-to-fine strategy. The normal-aligned diffusion model leverages normal maps as guiding cues to ensure the alignment of the generated texture with geometry. At the coarse level, we exclusively employ the SDS loss, while at the fine level we incorporate the multi-step SDS and perceptual loss to achieve realistic texture generation.
  </p>
  <img class="summary-img" src="figs/pipeline.png" style="width:100%;"> <br>




</div>

<!-- full body -->
<div class="content">
  <p style="text-align:left; font-size: 2em; font-weight: bold" class="serif">Results--Full-body</p>
  <p style="font-size: 1.2em" class="serif">
    We showcase 3D humans generated by our method, which include full-body, upper-body, and head-only models. Additionally, we offer text-based editing capabilities for these 3D humans.<br>
  </p>

  <div class="row">
    <div class="col">
      <video  width="100%" loop autoplay muted>
        <source src="results/fullbody_videos/000.mp4" type="video/mp4">
      </video>
      <p class="text-center">an American football player</p>
    </div>

    <div class="col">
      <video  width="100%" loop autoplay muted>
        <source src="results/fullbody_videos/001.mp4" type="video/mp4">
      </video>
      <p class="text-center">a DSLR photo of Joe Biden</p>
    </div>
  </div>

  <div class="row">
    <div class="col">
      <video  width="100%" loop autoplay muted>
        <source src="results/fullbody_videos/002.mp4" type="video/mp4">
      </video>
      <p class="text-center">a Medieval European King</p>
    </div>

    <div class="col">
      <video  width="100%" loop autoplay muted>
        <source src="results/fullbody_videos/003.mp4" type="video/mp4">
      </video>
      <p class="text-center">a man wearing a striped shirt and grey linen pants</p>
    </div>
  </div>

  <div class="row">
    <div class="col">
      <video  width="100%" loop autoplay muted>
        <source src="results/fullbody_videos/004.mp4" type="video/mp4">
      </video>
      <p class="text-center">a man wearing a blue jean jacket and jean trousers</p>
    </div>

    <div class="col">
      <video  width="100%" loop autoplay muted>
        <source src="results/fullbody_videos/005.mp4" type="video/mp4">
      </video>
      <p class="text-center">a professional boxer</p>
    </div>
  </div>

  <div class="row">
    <div class="col">
      <video  width="100%" loop autoplay muted>
        <source src="results/fullbody_videos/006.mp4" type="video/mp4">
      </video>
      <p class="text-center">a woman wearing a short jean skirt and a cropped top</p>
    </div>

    <div class="col">
      <video  width="100%" loop autoplay muted>
        <source src="results/fullbody_videos/007.mp4" type="video/mp4">
      </video>
      <p class="text-center">an elderly woman in a cardigan and skirt</p>
    </div>
  </div>

  <div class="row">
    <div class="col">
      <video  width="100%" loop autoplay muted>
        <source src="results/fullbody_videos/008.mp4" type="video/mp4">
      </video>
      <p class="text-center">a man wearing a white tanktop and shorts</p>
    </div>

    <div class="col">
      <video  width="100%" loop autoplay muted>
        <source src="results/fullbody_videos/009.mp4" type="video/mp4">
      </video>
      <p class="text-center">a Roman soldier wearing a silver armor</p>
    </div>
  </div>

  <div class="row">
    <div class="col">
      <video  width="100%" loop autoplay muted>
        <source src="results/fullbody_videos/010.mp4" type="video/mp4">
      </video>
      <p class="text-center">a DSLR photo of Mark Zuckerberg in blue jeans</p>
    </div>

    <div class="col">
      <video  width="100%" loop autoplay muted>
        <source src="results/fullbody_videos/011.mp4" type="video/mp4">
      </video>
      <p class="text-center">a DSLR photo of Tim Cook</p>
    </div>
  </div>

  <div class="row">
    <div class="col">
      <video  width="100%" loop autoplay muted>
        <source src="results/fullbody_videos/012.mp4" type="video/mp4">
      </video>
      <p class="text-center">a DSLR photo of Stephen Curry</p>
    </div>

    <div class="col">
      <video  width="100%" loop autoplay muted>
        <source src="results/fullbody_videos/013.mp4" type="video/mp4">
      </video>
      <p class="text-center">a DSLR photo of LeBron James</p>
    </div>
  </div>

  <div class="row">
    <div class="col">
      <video  width="100%" loop autoplay muted>
        <source src="results/fullbody_videos/014.mp4" type="video/mp4">
      </video>
      <p class="text-center">a karate master wearing a belt</p>
    </div>

    <div class="col">
      <video  width="100%" loop autoplay muted>
        <source src="results/fullbody_videos/015.mp4" type="video/mp4">
      </video>
      <p class="text-center">a DSLR photo of Leonardo DiCaprio in a maroon long sleeve top</p>
    </div>
  </div>

  <div class="row">
    <div class="col">
      <video  width="100%" loop autoplay muted>
        <source src="results/fullbody_videos/016.mp4" type="video/mp4">
      </video>
      <p class="text-center">a DSLR photo of Messi</p>
    </div>

    <div class="col">
      <video  width="100%" loop autoplay muted>
        <source src="results/fullbody_videos/017.mp4" type="video/mp4">
      </video>
      <p class="text-center">a DSLR photo of Marilyn Monroe</p>
    </div>
  </div>

  <div class="row">
    <div class="col">
      <video  width="100%" loop autoplay muted>
        <source src="results/fullbody_videos/018.mp4" type="video/mp4">
      </video>
      <p class="text-center">a DSLR photo of Elon Musk in gray long sleeve top</p>
    </div>

    <div class="col">
      <video  width="100%" loop autoplay muted>
        <source src="results/fullbody_videos/019.mp4" type="video/mp4">
      </video>
      <p class="text-center">a DSLR photo of Barack Obama in a suit</p>
    </div>
  </div>

  <div class="row">
    <div class="col">
      <video  width="100%" loop autoplay muted>
        <source src="results/fullbody_videos/020.mp4" type="video/mp4">
      </video>
      <p class="text-center">a DSLR photo of Jason Statham in brown long sleeve top</p>
    </div>

    <div class="col">
      <video  width="100%" loop autoplay muted>
        <source src="results/fullbody_videos/021.mp4" type="video/mp4">
      </video>
      <p class="text-center">a DSLR photo of Donald Trump</p>
    </div>
  </div>


  </div>

  <!-- upper body -->
<div class="content">
  <p style="text-align:left; font-size: 2em; font-weight: bold" class="serif">Results--Upper-body</p>

  <div class="row">
    <div class="col">
      <video  width="100%" loop autoplay muted>
        <source src="results/upperbody_videos/000.mp4" type="video/mp4">
      </video>
      <p class="text-center">a woman in a sari</p>
    </div>

    <div class="col">
      <video  width="100%" loop autoplay muted>
        <source src="results/upperbody_videos/007.mp4" type="video/mp4">
      </video>
      <p class="text-center">a woman in a business suit</p>
    </div>
  </div>

  <div class="row">
    <div class="col">
      <video  width="100%" loop autoplay muted>
        <source src="results/upperbody_videos/002.mp4" type="video/mp4">
      </video>
      <p class="text-center">a man in a tuxedo, with a crisp white shirt and bow tie</p>
    </div>

    <div class="col">
      <video  width="100%" loop autoplay muted>
        <source src="results/upperbody_videos/003.mp4" type="video/mp4">
      </video>
      <p class="text-center">a woman in an evening gown, with sparkling diamond earrings</p>
    </div>
  </div>

  <div class="row">
    <div class="col">
      <video  width="100%" loop autoplay muted>
        <source src="results/upperbody_videos/004.mp4" type="video/mp4">
      </video>
      <p class="text-center">an elderly man in a fishing vest</p>
    </div>

    <div class="col">
      <video  width="100%" loop autoplay muted>
        <source src="results/upperbody_videos/022.mp4" type="video/mp4">
      </video>
      <p class="text-center">a photo of Morgan Freeman in a deep blue shirt</p>
    </div>
  </div>

  <div class="row">
    <div class="col">
      <video  width="100%" loop autoplay muted>
        <source src="results/upperbody_videos/008.mp4" type="video/mp4">
      </video>
      <p class="text-center">a woman in a red dress, with a delicate gold necklace around her neck</p>
    </div>

    <div class="col">
      <video  width="100%" loop autoplay muted>
        <source src="results/upperbody_videos/009.mp4" type="video/mp4">
      </video>
      <p class="text-center">a teenager in a leather jacket</p>
    </div>
  </div>

  <div class="row">
    <div class="col">
      <video  width="100%" loop autoplay muted>
        <source src="results/upperbody_videos/010.mp4" type="video/mp4">
      </video>
      <p class="text-center">an elderly man in a cardigan sweater</p>
    </div>

    <div class="col">
      <video  width="100%" loop autoplay muted>
        <source src="results/upperbody_videos/011.mp4" type="video/mp4">
      </video>
      <p class="text-center">a child in a superhero costume, with a cape flowing behind them</p>
    </div>
  </div>

  <div class="row">
    <div class="col">
      <video  width="100%" loop autoplay muted>
        <source src="results/upperbody_videos/012.mp4" type="video/mp4">
      </video>
      <p class="text-center">a woman in a chef's coat</p>
    </div>

    <div class="col">
      <video  width="100%" loop autoplay muted>
        <source src="results/upperbody_videos/013.mp4" type="video/mp4">
      </video>
      <p class="text-center">a girl in a ballet leotard</p>
    </div>
  </div>

  <div class="row">
    <div class="col">
      <video  width="100%" loop autoplay muted>
        <source src="results/upperbody_videos/014.mp4" type="video/mp4">
      </video>
      <p class="text-center">a man in workout gear</p>
    </div>

    <div class="col">
      <video  width="100%" loop autoplay muted>
        <source src="results/upperbody_videos/015.mp4" type="video/mp4">
      </video>
      <p class="text-center">a DSLR photo of Joe Biden</p>
    </div>
  </div>

  <div class="row">
    <div class="col">
      <video  width="100%" loop autoplay muted>
        <source src="results/upperbody_videos/016.mp4" type="video/mp4">
      </video>
      <p class="text-center">an asian woman in bikini</p>
    </div>

    <div class="col">
      <video  width="100%" loop autoplay muted>
        <source src="results/upperbody_videos/018.mp4" type="video/mp4">
      </video>
      <p class="text-center">a DSLR photo of Stephen Curry</p>
    </div>
  </div>


  <div class="row">
    <div class="col">
      <video  width="100%" loop autoplay muted>
        <source src="results/upperbody_videos/024.mp4" type="video/mp4">
      </video>
      <p class="text-center">a DSLR photo of Barack Obama</p>
    </div>

    <div class="col">
      <video  width="100%" loop autoplay muted>
        <source src="results/upperbody_videos/026.mp4" type="video/mp4">
      </video>
      <p class="text-center">a photo of Mark Zuckerberg in a wine red shirt</p>
    </div>
  </div>


  </div>

  <!-- head only -->
<div class="content">
  <p style="text-align:left; font-size: 2em; font-weight: bold" class="serif">Results--Head-only</p>

  <div class="row">
    <div class="col">
      <video  width="100%" loop autoplay muted>
        <source src="results/headonly_videos/000.mp4" type="video/mp4">
      </video>
      <p class="text-center">a young man with a muscular jawline, stubble beard, and wearing a baseball cap</p>
    </div>

    <div class="col">
      <video  width="100%" loop autoplay muted>
        <source src="results/headonly_videos/014.mp4" type="video/mp4">
      </video>
      <p class="text-center">a DSLR photo of Leonardo DiCaprio</p>
    </div>
  </div>

  <div class="row">
    <div class="col">
      <video  width="100%" loop autoplay muted>
        <source src="results/headonly_videos/002.mp4" type="video/mp4">
      </video>
      <p class="text-center">a photo of Scarlett Johansson</p>
    </div>

    <div class="col">
      <video  width="100%" loop autoplay muted>
        <source src="results/headonly_videos/003.mp4" type="video/mp4">
      </video>
      <p class="text-center">Rick Grimes in The Walking Dead</p>
    </div>
  </div>

  <div class="row">
    <div class="col">
      <video  width="100%" loop autoplay muted>
        <source src="results/headonly_videos/004.mp4" type="video/mp4">
      </video>
      <p class="text-center">Daryl Dixon in The Walking Dead</p>
    </div>

    <div class="col">
      <video  width="100%" loop autoplay muted>
        <source src="results/headonly_videos/016.mp4" type="video/mp4">
      </video>
      <p class="text-center">a DSLR photo of Morgan Freeman</p>
    </div>
  </div>

  <div class="row">
    <div class="col">
      <video  width="100%" loop autoplay muted>
        <source src="results/headonly_videos/006.mp4" type="video/mp4">
      </video>
      <p class="text-center">a man with dreadlocks</p>
    </div>

    <div class="col">
      <video  width="100%" loop autoplay muted>
        <source src="results/headonly_videos/007.mp4" type="video/mp4">
      </video>
      <p class="text-center">a man with a pompadour hairstyle, his hair slicked back stylishly</p>
    </div>
  </div>

  <div class="row">
    <div class="col">
      <video  width="100%" loop autoplay muted>
        <source src="results/headonly_videos/008.mp4" type="video/mp4">
      </video>
      <p class="text-center">a man in his fifties with salt-and-pepper hair styled in a quiff</p>
    </div>

    <div class="col">
      <video  width="100%" loop autoplay muted>
        <source src="results/headonly_videos/009.mp4" type="video/mp4">
      </video>
      <p class="text-center">an elderly woman with deep wrinkles, sparkling eyes, and white hair tied in a bun</p>
    </div>
  </div>

  <div class="row">
    <div class="col">
      <video  width="100%" loop autoplay muted>
        <source src="results/headonly_videos/010.mp4" type="video/mp4">
      </video>
      <p class="text-center">a DSLR photo of Joe Biden</p>
    </div>

    <div class="col">
      <video  width="100%" loop autoplay muted>
        <source src="results/headonly_videos/011.mp4" type="video/mp4">
      </video>
      <p class="text-center">a DSLR photo of Cristiano Ronaldo</p>
    </div>
  </div>

  <div class="row">
    <div class="col">
      <video  width="100%" loop autoplay muted>
        <source src="results/headonly_videos/012.mp4" type="video/mp4">
      </video>
      <p class="text-center">a DSLR photo of Stephen Curry</p>
    </div>

    <div class="col">
      <video  width="100%" loop autoplay muted>
        <source src="results/headonly_videos/013.mp4" type="video/mp4">
      </video>
      <p class="text-center">a DSLR photo of Bill Gates</p>
    </div>
  </div>



  <div class="row">
    <div class="col">
      <video  width="100%" loop autoplay muted>
        <source src="results/headonly_videos/018.mp4" type="video/mp4">
      </video>
      <p class="text-center">a DSLR photo of Jason Statham</p>
    </div>

    <div class="col">
      <video  width="100%" loop autoplay muted>
        <source src="results/headonly_videos/019.mp4" type="video/mp4">
      </video>
      <p class="text-center">a DSLR photo of Daenerys Targaryen</p>
    </div>
  </div>



  </div>

  <!-- editing -->
<div class="content">
  <p style="text-align:left; font-size: 2em; font-weight: bold" class="serif">Text-based Editing</p>
  <p style="font-size: 1.2em" class="serif">
    Our method offers the capability to edit both the <b>texture and geometry</b> of the generated 3D humans by adjusting the input prompt.<br>

  </p>

  <div class="row">
    <div class="col">
      <video  width="100%" loop autoplay muted>
        <source src="results/editing_videos/000.mp4" type="video/mp4">
      </video>
      <p class="text-center">a photo of Messi in a purple shirt with afro hair</p>
    </div>

    <div class="col">
      <video  width="100%" loop autoplay muted>
        <source src="results/editing_videos/002.mp4" type="video/mp4">
      </video>
      <p class="text-center">a photo of Messi in a brown shirt with cornrows hair</p>
    </div>
  </div>

  <div class="row">
    <div class="col">
      <video  width="100%" loop autoplay muted>
        <source src="results/editing_videos/003.mp4" type="video/mp4">
      </video>
      <p class="text-center">a photo of Messi in a pink shirt with mohawk hair</p>
    </div>

    <div class="col">
      <video  width="100%" loop autoplay muted>
        <source src="results/editing_videos/004.mp4" type="video/mp4">
      </video>
      <p class="text-center">a photo of Messi in a baseball hat</p>
    </div>
  </div>

  <div class="row">
    <div class="col">
      <video  width="100%" loop autoplay muted>
        <source src="results/editing_videos/009.mp4" type="video/mp4">
      </video>
      <p class="text-center">a photo of Messi in a blue tank top</p>
    </div>

    <div class="col">
      <video  width="100%" loop autoplay muted>
        <source src="results/editing_videos/005.mp4" type="video/mp4">
      </video>
      <p class="text-center">a photo of Messi in a gray jacket</p>
    </div>
  </div>

  <div class="row">
    <div class="col">
      <video  width="100%" loop autoplay muted>
        <source src="results/editing_videos/008.mp4" type="video/mp4">
      </video>
      <p class="text-center">a photo of Messi in a yellow sweater</p>
    </div>

    <div class="col">
      <video  width="100%" loop autoplay muted>
        <source src="results/editing_videos/007.mp4" type="video/mp4">
      </video>
      <p class="text-center">a photo of Messi in a suit</p>
    </div>
  </div>


  </div>

<div class="content">
  <p style="text-align:left; font-size: 2em; font-weight: bold" class="serif">Pose Editing</p>
  <p style="font-size: 1.2em" class="serif">
  Our method also provides the ability to editing the pose of the generated avatars by adjusting the pose of the initialization mesh and modifying the prompts.
  </p> 
  <div class="row">
    <div class="col">
      <video  width="100%" loop autoplay muted>
        <source src="results/editing_videos/pose1.mp4" type="video/mp4">
      </video>
      <p class="text-center">a photo of Elon Musk with his hands on his hips</p>
    </div>

    <div class="col">
      <video  width="100%" loop autoplay muted>
        <source src="results/editing_videos/pose2.mp4" type="video/mp4">
      </video>
      <p class="text-center">a photo of Elon Musk raising a hand</p>
    </div>
  </div>

  </div>


</div>
</div>

<div class="content">
  <p style="text-align:left; font-size: 2em; font-weight: bold" class="serif">Ethics Statement</p>
  <p>The objective of HumanNorm is to equip users with a powerful tool for creating realistic 3D Human models. Our method allows users to generate 3D Humans based on their specific prompts. However, there is a potential risk that these generated models could be misused to deceive viewers. This problem is not unique to our approach but is prevalent in other generative model methodologies. Moreover, it is of paramount importance to give precedence to diversity in terms of gender, race, and culture. As such, it is absolutely essential for current and future research in the field of generative modeling to consistently address and reassess these considerations.</p>
  <br>
</div>


</body>

<script>
var videos = document.getElementsByClassName("clickplay");
for (var i = 0; i < videos.length; i++) {
  videos[i].addEventListener("click", function() {
    this.play();
  });
  videos[i].addEventListener("ended", function() {
    this.pause();
    this.currentTime = 0;
  });
}

document.querySelectorAll('.info-container').forEach(function(container) {
  container.addEventListener('mouseover', function() {
    var infoText = container.querySelector('.info-text');
    infoText.style.display = 'block';
  });

  container.addEventListener('mouseout', function() {
    var infoText = container.querySelector('.info-text');
    infoText.style.display = 'none';
  });
});
</script>

</html>
