<!DOCTYPE html>
<html>
  
<head>
  <meta charset="utf-8">
  <meta name="description" content="An Object is Worth 64x64 Pixels: Generating 3D Object via Image Diffusion">
  <meta name="keywords" content="Geometry Images, 3D generation, Generative model">
  <meta name="viewport" content="width=device-width, initial-scale=1">
  <title>Object Images 64x</title>
  <link rel="icon" href="icon.png">
  <script>
    window.dataLayer = window.dataLayer || [];

    function gtag() {
      dataLayer.push(arguments);
    }

    gtag('js', new Date());

    gtag('config', 'G-PYVRSFMDRL');

  </script>

  <link href="https://fonts.googleapis.com/css?family=Google+Sans|Noto+Sans|Castoro" rel="stylesheet">

  <link rel="stylesheet" href="./static/css/bulma.min.css">
  <link rel="stylesheet" href="./static/css/bulma-carousel.min.css">
  <link rel="stylesheet" href="./static/css/bulma-slider.min.css">
  <link rel="stylesheet" href="./static/css/fontawesome.all.min.css">
  <link rel="stylesheet" href="https://cdn.jsdelivr.net/gh/jpswalsh/academicons@1/css/academicons.min.css">
  <link rel="stylesheet" href="./static/css/index.css">

  <script src="https://ajax.googleapis.com/ajax/libs/jquery/3.5.1/jquery.min.js"></script>
  <script defer src="./static/js/fontawesome.all.min.js"></script>
  <script src="./static/js/bulma-carousel.min.js"></script>
  <script src="./static/js/bulma-slider.min.js"></script>
  <script src="./static/js/index.js"></script>
</head>

<body>



  <section class="hero">
    <div class="hero-body">
      <div class="container is-max-desktop">
        <div class="columns is-centered">
          <div class="column has-text-centered">
            <h2 class="title is-2 publication-title">
              An Object is Worth 64x64 Pixels: <br>
              Generating 3D Object via Image Diffusion
            </h2>
            <div class="is-size-5 publication-authors">
              Supplementary Material
            </div>
              <div class="is-size-5 publication-authors">
                Anonymous 3DV Submission, Submission ID: 266 
              </div>
          </div>
        </div>
      </div>
    </div>
  </section>

  <!-- Video teaser -->
  <section class="hero teaser">
    <div class="container is-max-desktop">
      <div class="hero-body">
        <video id="teaser" autoplay controls height="100%">
          <source src="static/videos/vid_teaser.mp4" type="video/mp4">
        </video>
        <h2 class="subtitle has-text-centered">
          We present <b>Object Images</b> (Omages): An homage to the classic <a href="https://hhoppe.com/proj/gim/">Geometry Images</a>.
        </h2>
      </div>
    </div>
  </section>


  <section class="section">
    <div class="container is-max-desktop">
      <!-- Abstract. -->
      <div class="columns is-centered has-text-centered">
        <div class="column is-four-fifths">
          <h2 class="title is-3">Abstract</h2>
          <div class="content has-text-justified">
            <p>
              We introduce a new approach for generating realistic 3D models with UV maps through a representation termed "Object Images." This approach encapsulates surface geometry, appearance, and patch structures within a 64x64 pixel image, effectively converting complex 3D shapes into a more manageable 2D format. By doing so, we address the challenges of both geometric and semantic irregularity inherent in polygonal meshes. This method allows us to use image generation models, such as Diffusion Transformers, directly for 3D shape generation. Evaluated on the ABO dataset, our generated shapes with patch structures achieve point cloud FID comparable to recent 3D generative models, while naturally supporting PBR material generation.
            </p>
          </div> 
        </div>
      </div>

      <br>
      <h2 class="title is-3">Motivation</h2>
      <div class="content has-text-justified">
        <p>
          Recently, 3D generative models have shown impressive results in synthesizing 3D objects. However, many of the current 3D generative models treat 3D shapes as a "statue" like objects. In contrary to the many high-quality human-made 3D assets which contains rich geometric and semantically meaningful patches, the statue-like objects are difficult to edit, animate and interact with. For example, the <a href="https://sketchfab.com/3d-models/headphone-with-stand-4ffedc9bffad4a549f6e0a46b0f92b05">headphone</a> shown below has intricate geometric parts that its "statue" version does not capture. Also, on the right example <a href="https://sketchfab.com/3d-models/book-pack-658cf47227a141e8abc607e455b1be7b">the pack of books</a> consists of multiple books standing closely to each other, which is very difficult to separate through current single-view reconstruction techniques. 
          </p>
          <p>
          The core challenge to generate 3D shapes with proper geometric connectivity and semantic part structures is the <b>irregularity</b> of these properties, since most recent techinques require regular, tensorial input. We find that these irregularities can be effectively handled through packing the geometry, patch structures and material into an image format, which we term as "Object Images" or "omages" (A kind of Multi-Chart Geometry Images). In this work, we explore to use image diffusion model to generate low-resolution omages to show this paradigm of 3D generation is possible. For more details, please refer to our paper.
        </p>
      </div>
      <div class="container is-max-desktop">
        <div class="hero-body"> 
          <img src="./static/images/fig_motivation.png" class="interpolation-image"
            alt="Interpolation end reference image." />
        </div>
      </div>

      <h2 class="title is-3">Method</h2>
      <div class="content has-text-justified">
        <p>
          We first preprocess the UV-unwrapped 3D shapes into 1024x1024 omages and then downsample it with special care to 64x64 omages. Then we just flatten the omage into a sequence and learn their distribution through a Diffusion Transformer (DiT) of patch size 1. The motivation of using DiT is that we observe the generation of omages is essentially image generation and set generation combined. A very cool thing is that during the denoising process, discrete structures emerge out of the continuous image format, and the generated results exhibit great variety in the number and size of patches.
        </p>
      </div>
      <div class="container is-max-desktop">
        <div class="hero-body">
          <img src="./static/images/fig_pipeline.png" class="interpolation-image"
            alt="Interpolation end reference image." />
          <br>
          <img src="./static/images/fig_teaser.png" class="interpolation-image"
            alt="Interpolation end reference image." />
        </div>
      </div>


      <h2 class="title is-3">More results samples</h2>
      <div class="content has-text-justified">
        <p>
          Here we show some more results of our model trained on the ABO dataset. Although there are gaps between the patches, the overall alignment of the patches demonstrate the potential of this 3D generation paradigm. We also show the potential of our model to generate objects with PBR materials, like mirrors.
        </p>
      </div>
      <div class="container is-max-desktop">
        <div class="hero-body">
          <img src="./static/images/fig_gallery.png" class="interpolation-image"
            alt="Interpolation end reference image." />
        </div>
      </div>

</body>

</html>
