<!DOCTYPE html>
<html lang="en-us">

  <head>
  <link href="http://gmpg.org/xfn/11" rel="profile">
  <meta http-equiv="content-type" content="text/html; charset=utf-8">

  <meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1">

  <title>
    
      Large Scale Image Completion via Co-Modulated GANs &middot; The ICLR Blog Track
    
  </title>

  
  <link rel="canonical" href="https://iclr.iro.umontreal.ca/fef0b9b3-5fbb-430e-b43e-b3c6c47477e2_1642201297/2021/12/01/co-mod-gan/">
  

  <link rel="stylesheet" href="https://iclr.iro.umontreal.ca/fef0b9b3-5fbb-430e-b43e-b3c6c47477e2_1642201297/public/css/poole.css">
  <link rel="stylesheet" href="https://iclr.iro.umontreal.ca/fef0b9b3-5fbb-430e-b43e-b3c6c47477e2_1642201297/public/css/syntax.css">
  <link rel="stylesheet" href="https://iclr.iro.umontreal.ca/fef0b9b3-5fbb-430e-b43e-b3c6c47477e2_1642201297/public/css/lanyon.css">
  <link rel="stylesheet" href="https://iclr.iro.umontreal.ca/fef0b9b3-5fbb-430e-b43e-b3c6c47477e2_1642201297/public/css/custom.css">
  <link rel="stylesheet" href="https://fonts.googleapis.com/css?family=PT+Serif:400,400italic,700%7CPT+Sans:400">

  <link rel="apple-touch-icon-precomposed" sizes="144x144" href="https://iclr.iro.umontreal.ca/fef0b9b3-5fbb-430e-b43e-b3c6c47477e2_1642201297/public/apple-touch-icon-precomposed.png">
  <link rel="shortcut icon" href="https://iclr.iro.umontreal.ca/fef0b9b3-5fbb-430e-b43e-b3c6c47477e2_1642201297/public/favicon.ico">

  <link rel="alternate" type="application/rss+xml" title="RSS" href="https://iclr.iro.umontreal.ca/fef0b9b3-5fbb-430e-b43e-b3c6c47477e2_1642201297/atom.xml">

  

  <script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript" ></script>
 <!-- <script type="text/x-mathjax-config"> MathJax.Hub.Config({ TeX: { equationNumbers: { autoNumber: "AMS" } } }); </script> -->
  <script type="text/x-mathjax-config">
      MathJax.Hub.Config({
        tex2jax: { inlineMath: [ ['$','$'], ["\\(","\\)"] ],
         processEscapes: false
        }
      });
</script>
</head>


  <body>

    <!-- Target for toggling the sidebar `.sidebar-checkbox` is for regular
     styles, `#sidebar-checkbox` for behavior. -->
<input type="checkbox" class="sidebar-checkbox" id="sidebar-checkbox">
<!-- <input type="checkbox" class="sidebar-checkbox" id="sidebar-checkbox" > -->

<!-- Toggleable sidebar -->
<div class="sidebar" id="sidebar">
  <div class="sidebar-item">
    <p>For short-term, peer-sourced tests of time, generalizations, specializations, reproductions, etc.!</p>
  </div>

  <nav class="sidebar-nav">

    

    
    
      
        
          <a class="sidebar-nav-item" href="https://iclr.iro.umontreal.ca/fef0b9b3-5fbb-430e-b43e-b3c6c47477e2_1642201297/">ICLR 2022 Blog Track</a>
        
      
    
      
        
      
    
      
        
          <a class="sidebar-nav-item" href="https://iclr.iro.umontreal.ca/fef0b9b3-5fbb-430e-b43e-b3c6c47477e2_1642201297/about/">About</a>
        
      
    
      
    
      
        
      
    
      
        
          <a class="sidebar-nav-item" href="https://iclr.iro.umontreal.ca/fef0b9b3-5fbb-430e-b43e-b3c6c47477e2_1642201297/submitting/">Submitting</a>
        
      
    
      
        
          <a class="sidebar-nav-item" href="https://iclr.iro.umontreal.ca/fef0b9b3-5fbb-430e-b43e-b3c6c47477e2_1642201297/tags/">Tags</a>
        
      
    

    <a class="sidebar-nav-item" href="https://github.com/iclr-blog-track/iclr-blog-track.github.io">GitHub project</a>
    <span class="sidebar-nav-item">Currently vICLR Spring 2021</span>
  </nav>

  <div class="sidebar-item">
    <p>
      &copy; 2022. All rights reserved.
    </p>
  </div>
</div>


    <!-- Wrap is the content to shift when toggling the sidebar. We wrap the
         content to avoid any CSS collisions with our real content. -->
    <div class="wrap">
      <div class="masthead">
        <div class="container">
          <h3 class="masthead-title">
            <a href="/" title="Home">The ICLR Blog Track</a>
            <small></small>
          </h3>
        </div>
      </div>

      <div class="container content">
        <div class="post">
  <h1 id="iclr-post-title" class="post-title">Large Scale Image Completion via Co-Modulated GANs</h1>
  <span class="post-date">01 Dec 2021 | 
    <a class="content-tag" href="/tags/#gan"> GAN </a>
  
    <a class="content-tag" href="/tags/#co-modulation"> co-modulation </a>
  
    <a class="content-tag" href="/tags/#image-completion"> image completion </a>
  
    <a class="content-tag" href="/tags/#image-to-image-translation"> image-to-image translation </a>
  </span>

  <span id="iclr-post-authors" class="post-date">Anonymous</span>
  <p><em>This blog is an overview of <a href="https://openreview.net/pdf?id=sSjqmfsk95O">this paper</a> by Shengyu Zhao et al. published at ICLR 2021.</em></p>

<p>Generative adversarial networks (GANs) have recently become popular deep learning models for producing synthetic yet realistic images. They have attained state-of-the-art performance on image generation tasks, including <a href="https://openaccess.thecvf.com/content_ICCV_2017/papers/Zhu_Unpaired_Image-To-Image_Translation_ICCV_2017_paper.pdf">image-to-image translation</a>.</p>

<p>Since vanilla GANs do not perform well in free-form image completion tasks (<a href="https://openaccess.thecvf.com/content_ECCV_2018/papers/Guilin_Liu_Image_Inpainting_for_ECCV_2018_paper.pdf">Liu et al., 2018</a>; <a href="https://openaccess.thecvf.com/content_ICCV_2019/papers/Yu_Free-Form_Image_Inpainting_With_Gated_Convolution_ICCV_2019_paper.pdf">Yu et al., 2019</a>), there has been significant progress using task-specific approaches to improve the quality of generated images, such as by minimizing color discrepancy. These specialized GANs work well for small-scale image completion tasks, but they do not achieve promising performance when handling large-scale missing image regions. Why might existing models be insufficient for larger regions?</p>

<p>The answer lies in the fact that existing algorithms do not have the capability of generating completely new images. If a model cannot generate a completely new image, then it will have a difficult time completing a large region of an existing image.</p>

<p>To handle large-scale missing regions, <a href="https://openreview.net/pdf?id=sSjqmfsk95O">Zhao et al.</a> propose co-modulated generative adversarial networks, which use co-modulation to combine the generative capability of unconditional modulated models with conditional and stochastic style representations. They found that co-modulated GANs produce contents that are diverse and consistent with the rest of the image and that generalize well to large-scale image completion tasks.</p>

<p>In this blog, we summarize the approach of co-modulated GANs for image completion, its generalization to image-to-image translation tasks, and a new proposed quantitative evaluation metric for GANs. We also extend the findings by Zhao et al. by performing new image completion experiments to examine the biases of co-modulated GANs.</p>

<h2 id="overview">Overview</h2>

<ol>
  <li><a href="#summary-of-co-modulated-gans">Summary of Co-modulated GANs</a></li>
  <li><a href="#experiments">Experiments</a></li>
  <li><a href="#biases-and-limitations-of-co-modulated-gans">Biases and Limitations of Co-modulated GANs</a></li>
  <li><a href="#related-work">Related Work</a></li>
  <li><a href="#conclusion">Conclusion</a></li>
</ol>

<h2 id="summary-of-co-modulated-gans">Summary of Co-modulated GANs</h2>

<p>Co-modulated GANs link image-conditional GANs and unconditional modulated models to address large-scale image completion tasks. Co-modulation brings stochastic and conditional style representations together. To improve existing metrics for image completion, the proposed Paired/Unpaired Inception Discriminative Score (P-IDS/U-IDS) is robust to sampling size, captures subtle differences well, and correlates with human preferences. Experiments using co-modulated GANs lead to high quality and diverse results in free-form image completion and image-to-image translation tasks.</p>

<h3 id="modulation">Modulation</h3>

<p>In the unconditional modulated generator, the decoder $\mathcal{D}$ originates from a learned constant, while the latent vector $z$ is passed through a multi-layer fully connected mapping network $\mathcal{M}$, linearly generating a style vector $s$ for each modulation via a learned affine transformation $\mathcal{A}$.</p>

<p align="center">
  <img src="https://iclr.iro.umontreal.ca/fef0b9b3-5fbb-430e-b43e-b3c6c47477e2_1642201297/public/images/2021-12-01-co-mod-gan/fig2a.png" />
  Figure 2a (Zhao et al., 2021)
</p>

<p>The style vector can be expressed as the following:</p>

\[\begin{equation}
s = \mathcal{A}(\mathcal{M}(z))
\end{equation}\]

<p>Here, $s$ is the style vector, $\mathcal{A}$ is the affine transformation, $\mathcal{M}$ is the mapping network, and $z$ is the latent vector.</p>

<p>The conditional modulated generator extends the image-conditional generator by conditioning the modulation on the learned flattened features from the image encoder $\mathcal{E}$.</p>

<p align="center">
  <img src="https://iclr.iro.umontreal.ca/fef0b9b3-5fbb-430e-b43e-b3c6c47477e2_1642201297/public/images/2021-12-01-co-mod-gan/fig2c.png" />
  Figure 2c (Zhao et al., 2021)
</p>

<p>The style vector can be represented as the following:</p>

\[\begin{equation}
s = \mathcal{A}(\mathcal{E}(y))
\end{equation}\]

<p>$\mathcal{E}$ is the conditional image encoder and $y$ is the conditional input image.</p>

<p>One limitation with conditional modulation is the lack of stochastic generative capability, which arises in large scale image completion. Outputs are weakly conditioned, which results in poor generalization and the lack of diverse outputs.</p>

<h3 id="co-modulation">Co-modulation</h3>

<p align="center">
  <img src="https://iclr.iro.umontreal.ca/fef0b9b3-5fbb-430e-b43e-b3c6c47477e2_1642201297/public/images/2021-12-01-co-mod-gan/fig2d.png" />
  Figure 2d (Zhao et al., 2021)
</p>

<p>The co-modulation of conditional and stochastic style representations combines unconditional modulated generators with image-conditional generators.</p>

<p>The co-modulated style vector can be expressed as a joint affine transformation conditioning on both style representations:</p>

\[\begin{equation}
s = \mathcal{A}(\mathcal{E}(y), \mathcal{M}(z))
\end{equation}\]

<p>The linear mapping in the style vector contributes to stochasticity. Furthermore, the co-modulation approach improves visual quality at large-scale missing regions.</p>

<h3 id="pairedunpaired-inception-discriminative-score">Paired/Unpaired Inception Discriminative Score</h3>

<p>Evaluation methods are important for assessing the performance of GANs, but there is a lack of good quantitative metrics for image completion (<a href="https://openaccess.thecvf.com/content_ECCV_2018/papers/Guilin_Liu_Image_Inpainting_for_ECCV_2018_paper.pdf">Liu et al., 2018</a>; <a href="https://openaccess.thecvf.com/content_cvpr_2018/papers/Yu_Generative_Image_Inpainting_CVPR_2018_paper.pdf">Yu et al., 2018</a>; <a href="https://openaccess.thecvf.com/content_ICCV_2019/papers/Yu_Free-Form_Image_Inpainting_With_Gated_Convolution_ICCV_2019_paper.pdf">Yu et al., 2019</a>). Existing image completion algorithms rely on similarity-based metrics, such as PSNR and SSIM, but these approaches prefer blurry results that are less than ideal. As a result, qualitative metrics have been used, such as conducting a user study that asks humans to identify real vs. fake images. However, this approach lacks reproducibility and is time- and labor-intensive.</p>

<p>To address these challenges, Zhao et al. propose a new quantitative evaluation metric called the Paired/Unpaired Inception Discriminative Score (P-IDS/U-IDS). The P-IDS/U-IDS has advantages over the Fréchet inception distance (FID) because it is robust to sampling size, effective at capturing subtle differences, and correlates well to human preferences.</p>

<p>P-IDS/U-IDS calculates the linear separability in the pre-trained feature space of an Inception v3 model $\mathcal{I}(\cdot)$ that maps an input image to output features of 2048 dimensions. In the paired case, we have pairs of real images and their respective generated fake images $(x, x’) \in X$, where $x$ is the real image and $x’$ is the fake image. Features are extracted from an equal number of real and fake images and then fitted by a linear SVM, which represents the linear separability in the feature space. Let $f(\cdot)$ be the decision function of the SVM, and $f(\mathcal{I}(x)) &gt; 0$ if and only if $x$ is considered real. The P-IDS determines the probability that a fake image is considered more realistic than its corresponding real image, and it is given by:</p>

\[\begin{equation}
    \textbf{P-IDS}(X) = \Pr_{(x, x') \in X} \{ f(\mathcal{I}(\textbf{x}')) &gt; f(\mathcal{I}(\textbf{x}))\}
\end{equation}\]

<p>Similarly, in the case where we do not have pairs of real and fake images, we sample an equal number of real images (from distribution $X$) and fake images (from distribution $X’$) and fit the linear SVM $f(\cdot)$. The U-IDS determines the misclassification rate, and it is given by:</p>

\[\begin{equation}
    \textbf{U-IDS}(X, X') = \frac{1}{2} \Pr_{x \in X} \{f(\mathcal{I}(\textbf{x})) &lt; 0 \} + \frac{1}{2} \Pr_{x' \in X} \{f(\mathcal{I}(\textbf{x}') &gt; 0\}
\end{equation}\]

<h2 id="experiments">Experiments</h2>

<h3 id="image-completion">Image Completion</h3>

<p>To use a co-modulated GAN for image completion, the model takes in as input a real image and a mask image that represents missing image regions, and it outputs a generated fake image that fills in the missing regions. The masks can cover any shape of the real image, including the entire image. To try a demo of co-modulated GANs, visit <a href="https://comodgan.azurewebsites.net/en-US/">https://comodgan.azurewebsites.net/en-US/</a>.</p>

<p>Zhao et al. conducted image completion experiments using the <a href="https://github.com/NVlabs/ffhq-dataset">FFHQ dataset</a> and the <a href="http://places2.csail.mit.edu/">Places2 dataset</a>. They compare the performance of using a co-modulated GAN with the performance of other state-of-the-art methods such as <a href="https://github.com/zhaoyuzhi/deepfillv2">DeepFillv2</a>.</p>

<p align="center">
  <img src="https://iclr.iro.umontreal.ca/fef0b9b3-5fbb-430e-b43e-b3c6c47477e2_1642201297/public/images/2021-12-01-co-mod-gan/fig7.png" />
  Figure 7 (Zhao et al., 2021)
</p>

<p>Below are quantitative results for large scale image completion compared to DeepFillv2 and RFR.</p>

<p align="center">
  <img src="https://iclr.iro.umontreal.ca/fef0b9b3-5fbb-430e-b43e-b3c6c47477e2_1642201297/public/images/2021-12-01-co-mod-gan/table1.png" />
  Table 1 (Zhao et al., 2021)
</p>

<h3 id="effect-on-mask-shape-on-generated-images-using-places2">Effect on Mask Shape on Generated Images Using Places2</h3>

<p>Extending the work of Zhao et al, we investigate whether co-modulated GANs exhibit shape bias and evaluate the difference between the generated object and the masked object. Given an input image, we first create a semantic segmentation, using the model trained on the <a href="https://groups.csail.mit.edu/vision/datasets/ADE20K/">MIT ADE20K dataset</a>. Then, we separate the segmentation out into its individual predicted classes and create a mask from one of the top predicted classes. Using the co-modulated GAN pre-trained on the Places2 dataset, we generate new images from the input image and the mask from the segmentation. The figure provides an example of the setup of this experiment.</p>

<p><img src="https://iclr.iro.umontreal.ca/fef0b9b3-5fbb-430e-b43e-b3c6c47477e2_1642201297/public/images/2021-12-01-co-mod-gan/places_method.png" alt="Shape Experiment Method" /></p>

<p>Below is a sample of image generations from an image from the Places2 dataset with a mask on the swimming pool. The results lack diversity and fail to generalize that the shape of the pool is the shape of the pool mask. The results do show some semantic understanding that there would be a body of water in this region and consistently generates a pool of a certain shape that’s not the shape of the mask.</p>

<p><img src="https://iclr.iro.umontreal.ca/fef0b9b3-5fbb-430e-b43e-b3c6c47477e2_1642201297/public/images/2021-12-01-co-mod-gan/beachsample.png" alt="Places Examples" /></p>

<p>When using the co-modulated GAN trained on the Places2 dataset, generated images may fail to generalize object generation to the segmented shape, but the co-modulated GAN does show some semantic understanding of the class of object of the masked region. In addition, the generated images lack diversity.</p>

<h3 id="examining-the-distribution-of-color-in-completed-regions-using-ffhq">Examining the Distribution of Color in Completed Regions Using FFHQ</h3>

<p>We extend the experiments by Zhao et al. by examining whether co-modulated GANs have color biases in the generated images. For our experiment, we sample 40 fake images using a given real image from the FFHQ dataset, and we use a mask that aims to cover a person’s shirt by masking the bottom one-fourth of the image. Then, we assess visually the diversity of the distribution of shirt colors in the generated fake images by calculating the number of fake images with shirts of different colors. We chose this evaluation approach because we need to determine the color in specific image regions that contain the person’s shirt, and existing approaches do not solve this task. The figure below shows an example of a real image and mask that was used for this experiment.</p>

<p><img src="https://iclr.iro.umontreal.ca/fef0b9b3-5fbb-430e-b43e-b3c6c47477e2_1642201297/public/images/2021-12-01-co-mod-gan/ffhq_experiment_method.png" alt="FFHQ Experiment Method" /></p>

<p>The person in the real image of the above figure wears a solid yellow top, which is covered completely by the mask. Since there is no indication of yellow clothing in the masked image, we expect that the co-modulated GAN produces fake images with differently colored clothes for the person. To determine the distribution of images, we grouped the shirt colors of fake images into one of nine possible colors: red, orange, yellow, green, blue, purple, black, gray, and white. The table below shows the distribution of shirt colors in the fake images.</p>

<p><img src="https://iclr.iro.umontreal.ca/fef0b9b3-5fbb-430e-b43e-b3c6c47477e2_1642201297/public/images/2021-12-01-co-mod-gan/ffhq_experiment1.png" alt="FFHQ Example 1 Images" /></p>

<p>We observed that the 40 sampled images for this specific real image contained only red, blue, purple, and gray shirt colors. An interesting next step would be to determine why the co-modulated GAN tends to generate images with these colors. In addition, the generated shirts all contained one solid color. Examples of fake images for each of these colors are included in the table.</p>

<p>Another example of this experiment uses the real image in the figure below, which contains a person wearing solid blue and green clothing. In contrast, we see that most of the generated images contain non-solid colors that are not blue or green and contain very similar shirt designs. Because the shirt colors are not solid, it is difficult to categorize them into specific colors. Instead, we display 20 of the 40 sampled images.</p>

<p><img src="https://iclr.iro.umontreal.ca/fef0b9b3-5fbb-430e-b43e-b3c6c47477e2_1642201297/public/images/2021-12-01-co-mod-gan/ffhq_experiment2.png" alt="FFHQ Example 2 Images" /></p>

<p>We noticed that a significant number of the shirt colors contain white and designs with red, blue, gray, and brown-like colors. The amount of white in this color distribution could be influenced by the white background color, but it is uncertain why the non-white colors and generated designs in the masked region of the image are less diverse.</p>

<h3 id="image-to-image-translation">Image-To-Image Translation</h3>

<p>Since co-modulated GANs are generic image-conditional models, they can be easily extended to image-to-image translation tasks. Zhao et al. perform image-to-image translation experiments on <a href="http://efrosgans.eecs.berkeley.edu/pix2pix/datasets/">the Edges2Handbags dataset and the Edges2Shoes dataset</a>.</p>

<p align="center">
  <img src="https://iclr.iro.umontreal.ca/fef0b9b3-5fbb-430e-b43e-b3c6c47477e2_1642201297/public/images/2021-12-01-co-mod-gan/fig13.png" />
  Figure 13 (Zhao et al., 2021)
</p>

<p>Using a co-modulated GAN results in higher fidelity (FID) over state-of-the-art methods on these edges to photos datasets. The model also produces superior diversity on the Edges2Handbags dataset, but it does not learn to generate diverse images on the Edges2Shoes dataset. The table below compares the performance of the co-modulated GAN with other methods.</p>

<p align="center">
  <img src="https://iclr.iro.umontreal.ca/fef0b9b3-5fbb-430e-b43e-b3c6c47477e2_1642201297/public/images/2021-12-01-co-mod-gan/table2.png" />
  Table 2 (Zhao et al., 2021)
</p>

<p>The images below include results from the image-to-image translation task on the <a href="https://github.com/nightrome/cocostuff">COCO-Stuff</a> validation set using <a href="https://github.com/NVlabs/SPADE">SPADE</a> and co-modulated GANs.</p>

<p align="center">
  <img src="https://iclr.iro.umontreal.ca/fef0b9b3-5fbb-430e-b43e-b3c6c47477e2_1642201297/public/images/2021-12-01-co-mod-gan/fig15.png" />
  Figure 15 (Zhao et al., 2021)
</p>

<h2 id="biases-and-limitations-of-co-modulated-gans">Biases and Limitations of Co-modulated GANs</h2>

<p>Large-scale image completion is a difficult task. While the co-modulated GAN is able to produce diverse and high quality results in image completion and generalize easily to image-to-image translation, there are still several limitations to the model. Sometimes the model underperforms in recognizing semantic information, particularly in the Places2 dataset, which contains millions of different scenes. As a result, strange artifacts may appear. In addition, some objects from generated images are more diverse within the object class while other objects converge to generating a certain type of object. For example, masked regions in natural sceneries tend to generate similar images, whereas masked regions on buildings tend to have a wider variety of generated objects. While the co-modulated GAN incorporates stochasticity and generalization, it’s difficult to find the right balance.</p>

<p align="center">
  <img src="https://iclr.iro.umontreal.ca/fef0b9b3-5fbb-430e-b43e-b3c6c47477e2_1642201297/public/images/2021-12-01-co-mod-gan/fig17.png" />
  Figure 17 (Zhao et al., 2021)
</p>

<p>The figure below shows examples of images generated using the co-modulated GAN and FFHQ dataset for the image completion task. As the mask covered up more of the image, there were noticeable changes in age, race, and gender as well as facial expressions, clothing, and accessories from the generated image to the input image. Specifically, we see that the generated images tend to include red colors that are not present in the original images. This aligns with the findings of our experiment studying the distribution of colors among generated images in the previous section.</p>

<p align="center">
  <img src="https://iclr.iro.umontreal.ca/fef0b9b3-5fbb-430e-b43e-b3c6c47477e2_1642201297/public/images/2021-12-01-co-mod-gan/figure16.png" />
  Figure 16 (Zhao et al., 2021)
</p>

<p>We also noticed that there are sometimes unusual artifacts added to images, and other failure cases may occur. For example, in the top row of images in the figure below, the generated image includes bright blue-green artifacts to the person’s face near the nose and mouth. This blue-green color appears to be the same color that is added to the person’s clothing in the image. In the second row of images, the co-modulated GAN fails to recreate an image of the young child.</p>

<p><img src="https://iclr.iro.umontreal.ca/fef0b9b3-5fbb-430e-b43e-b3c6c47477e2_1642201297/public/images/2021-12-01-co-mod-gan/failure_cases.png" alt="Failure Cases" /></p>

<h2 id="related-work">Related Work</h2>

<h3 id="image-completion-1">Image Completion</h3>

<p>Image completion can be viewed as a constrained image-to-image translation problem. Early works for image completion used low-level features and could not generate semantically consistent contents. With the rise of deep neural networks, we have seen improvements in image completion tasks. Then, the work by <a href="https://arxiv.org/pdf/1604.07379.pdf">Pathak et al.</a> on Context Encoders gives rise to the first conditional GAN for inpainting.</p>

<p>Many follow-up works improved on semantic context and texture, edges and contours, and various architectures. Later works focused on the lack of stochasticity and image outpainting subtasks.</p>

<h3 id="image-to-image-translation-1">Image-To-Image Translation</h3>

<p>Image-conditional GANs can be applied to many image-to-image translation tasks. <a href="https://phillipi.github.io/pix2pix/">pix2pix</a> uses a conditional adversarial network that learns both the mapping from input to output image and the loss function to train this mapping. As a result, this network has a wide applicability to various image-to-image translation problems, such as semantic labels to photo, black and white to color, edges to photo, day to night, and aerial to map, without the need for extra parameter tweaking.</p>

<h3 id="bias-and-generalization">Bias and Generalization</h3>

<p>Generative models rely heavily on inductive bias to extrapolate from training data, but the inductive bias of generative models has not been studied as much as that of classification models. <a href="https://ermongroup.github.io/blog/bias-and-generalization-dgm/">Shengjia Zhao et al.</a> studies the inductive bias of generative models by proposing a framework to understand how well these models generalize based on the training samples. For example, they found that, if a generative model is trained on many images with 3 objects in each image, then the model does not always generate images with 3 objects each. In addition, <a href="https://ali-design.github.io/gan_steerability/">Ali Jahanian et al.</a> explore the steerability of GANs by comparing latent space trajectories with simple image transformations. They found that biases of the training data limit the extent of transformations. Nonetheless, steering in the latent space of these models enables generalization.</p>

<h3 id="other-generative-models">Other Generative Models</h3>
<p>While GANs are able generate high fidelity outputs, compared to autoregressive models, VAEs, and flow-based models, they may face issues of mode collapse and may be unstable to train. Recently, diffusion models have gained recognition. Diffusion models gradually add random Gaussian noise to a data point sampled from the real distribution and later learn to reverse the diffusion process to produce a sample from the data distribution.</p>

<p><a href="https://arxiv.org/pdf/2111.05826.pdf">Palette</a> uses an image-to-image diffusion model to perform inpainting, colorization, uncropping, and JPEG artifact removal tasks. To evaluate performance of image-to-image translation tasks, Saharia et al. use quantitative metrics (Inception Score, Fréchet Inception Distance, Classification Accuracy, and Perceptual Distance), visual inspection to assess sample diversity, and human evaluation (2-alternative forced choice trials) to determine whether humans can discriminate model outputs from reference images.</p>

<p>Recent work show various applications of diffusion-based models. <a href="https://arxiv.org/pdf/2112.10741.pdf">GLIDE</a> uses a diffusion-based text-conditional image synthesis model. This model uses guided guided diffusion, which uses classifier guidance to improve samples from class-conditional diffusion models by perturbing the model with the gradient of the log-probability of the target class. Nichol et al. evaluate two guidance strategies: CLIP guidance and classifier-free guidance. Classifier-free guidance is preferred by human evaluators for its photorealism and caption similarity. Furthermore, this model can be fine-tuned to perform image inpainting tasks.</p>

<p align="center">
  <img src="https://iclr.iro.umontreal.ca/fef0b9b3-5fbb-430e-b43e-b3c6c47477e2_1642201297/public/images/2021-12-01-co-mod-gan/glide-fig2.png" />
  Figure 2 (Nichol et al., 2021)
</p>

<h2 id="conclusion">Conclusion</h2>

<p>Overall, co-modulated GANs are a promising approach for large-scale image completion tasks. In this blog, we presented an overview of co-modulated GANs, the proposed P-IDS/U-IDS quantitative metrics, and results from the paper by Zhao et al. We also performed new experiments investigating the effect of mask shape on generated images using the Places2 dataset, as well as the distribution of color in completed regions using the FFHQ dataset. In addition, we found interesting biases and limitations of using co-modulated GANs on some images in the FFHQ and Places2 datasets, such as the tendency to generate images with red colors.</p>

<p><em style="color:grey">
    The authors certify that they have no affiliations with the authors of the paper or involvement in the subject matter or materials discussed in this paper.
</em></p>

</div>

<div id="bibtex-container" class="related">
  For attribution in academic contexts, please cite this work as
  <pre id="bibtex-academic-attribution">

  </pre>

  BibTeX citation
  <pre id="bibtex-box">

  </pre>
</div>
<script>
  let authorsSpan = document.getElementById("iclr-post-authors");
  let authorsText = authorsSpan.textContent;
  let lnameFnameInstitution = authorsText.split(";");
  let lfiList = lnameFnameInstitution.map(lfi => lfi.split(",").map(item => item.trim()));
  let bibtexLFI = lfiList.map(lfi => lfi[0] + ", " + lfi[1]).join(" and ")
  let academicLFI = lfiList.map(lfi => lfi[0]);
  {
    if(academicLFI.length > 2) academicLFI = academicLFI[0] + ", et al.";
    else if(academicLFI.length == 2) academicLFI = academicLFI[0] + " & " + academicLFI[1];
    else academicLFI = academicLFI[0];
  }

  let titleSpan = document.getElementById("iclr-post-title");
  let titleText = titleSpan.textContent.trim();
  let bibtexTitleShorthand = (lfiList[0][1]+
    "2022"+
    titleText.split(" ").slice(0, 3).join("")
  ).replace(" ", "").replace(/[\p{P}$+<=>^`|~]/gu, '').toLowerCase().trim();

  let bibtexTemplate = `
@inproceedings{${bibtexTitleShorthand}},
  author = {${bibtexLFI}},
  title = {${titleText}},
  booktitle = {ICLR Blog Track},
  year = {2022},
  note = {${window.location.href}},
  url  = {${window.location.href}}
}
  `.trim();
  document.getElementById("bibtex-box").innerText = bibtexTemplate;

  let academicTemplate = `
${academicLFI}, "${titleText}", ICLR Blog Track, 2022.
`.trim();
  document.getElementById("bibtex-academic-attribution").innerText = academicTemplate;

</script>


<div class="related">
  <h2>Related posts</h2>
  <ul class="related-posts">
    
      <li>
        <h3>
          <a href="/2021/09/01/sample-submission/">
            Sample Submission
            <small>01 Sep 2021 | 
    <a class="content-tag" href="/tags/#gan"> GAN </a>
  
    <a class="content-tag" href="/tags/#co-modulation"> co-modulation </a>
  
    <a class="content-tag" href="/tags/#image-completion"> image completion </a>
  
    <a class="content-tag" href="/tags/#image-to-image-translation"> image-to-image translation </a>
  </small>
          </a>
        </h3>
      </li>
    
      <li>
        <h3>
          <a href="/2020/04/02/example-content/">
            Example content (Basic Markdown)
            <small>02 Apr 2020 | 
    <a class="content-tag" href="/tags/#gan"> GAN </a>
  
    <a class="content-tag" href="/tags/#co-modulation"> co-modulation </a>
  
    <a class="content-tag" href="/tags/#image-completion"> image completion </a>
  
    <a class="content-tag" href="/tags/#image-to-image-translation"> image-to-image translation </a>
  </small>
          </a>
        </h3>
      </li>
    
  </ul>
</div>


<script src="https://utteranc.es/client.js"
        repo="iclr-blog-track/iclr-blog-track.github.io"
        issue-term="pathname"
        label="utterance"
        theme="boxy-light"
        crossorigin="anonymous"
        >
</script>


      </div>
    </div>

    <label for="sidebar-checkbox" class="sidebar-toggle"></label>

    <script src='/public/js/script.js'></script>
  </body>
</html>
