{
  "metadata": {
    "forum_id": "Hkl5aoR5tm",
    "review_id": "rylkjAtu2m",
    "rebuttal_id": "rylaP-uP6m",
    "title": "On Self Modulation for Generative Adversarial Networks",
    "reviewer": "AnonReviewer1",
    "rating": 7,
    "conference": "ICLR2019",
    "permalink": "https://openreview.net/forum?id=Hkl5aoR5tm&noteId=rylaP-uP6m",
    "annotator": "anno8"
  },
  "review_sentences": [
    {
      "review_id": "rylkjAtu2m",
      "sentence_index": 0,
      "text": "The paper examines an architectural feature in GAN generators -- self-modulation -- and presents empirical evidence supporting the claim that it helps improve modeling performance.",
      "suffix": "",
      "review_action": "arg_fact",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "rylkjAtu2m",
      "sentence_index": 1,
      "text": "The self-modulation mechanism itself is implemented via FiLM layers applied to all convolutional blocks in the generator and whose scaling and shifting parameters are predicted as a function of the noise vector z.",
      "suffix": "",
      "review_action": "arg_fact",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "rylkjAtu2m",
      "sentence_index": 2,
      "text": "Performance is measured in terms of Fr\u00e9chet Inception Distance (FID) for models trained with and without self-modulation on a fairly comprehensive range of model architectures (DCGAN-based, ResNet-based), discriminator regularization techniques (gradient penalty, spectral normalization), and datasets (CIFAR10, CelebA-HQ, LSUN-Bedroom, ImageNet).",
      "suffix": "",
      "review_action": "arg_fact",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "rylkjAtu2m",
      "sentence_index": 3,
      "text": "The takeaway is that self-modulation is an architectural feature that helps improve modeling performance by a significant margin in most settings.",
      "suffix": "",
      "review_action": "arg_fact",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "rylkjAtu2m",
      "sentence_index": 4,
      "text": "An ablation study is also performed on the location where self-modulation is applied, showing that it is beneficial across all locations but has more impact towards the later layers of the generator.",
      "suffix": "\n\n",
      "review_action": "arg_fact",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "rylkjAtu2m",
      "sentence_index": 5,
      "text": "I am overall positive about the paper: the proposed idea is simple, but is well-explained and backed by rigorous evaluation.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_substance",
      "polarity": "pol_positive"
    },
    {
      "review_id": "rylkjAtu2m",
      "sentence_index": 6,
      "text": "Here are the questions I would like the authors to discuss further:",
      "suffix": "\n\n",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_heading",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "rylkjAtu2m",
      "sentence_index": 7,
      "text": "- The proposed approach is a fairly specific form of self-modulation.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_soundness-correctness",
      "polarity": "pol_negative"
    },
    {
      "review_id": "rylkjAtu2m",
      "sentence_index": 8,
      "text": "In general, I think of self-modulation as a way for the network to interact with itself, which can be a local interaction, like for squeeze-and-excitation blocks.",
      "suffix": "",
      "review_action": "arg_fact",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "rylkjAtu2m",
      "sentence_index": 9,
      "text": "In the case of this paper, the self-interaction allows the noise vector z to interact with various intermediate features across the generation process, which for me appears to be different than allowing intermediate features to interact with themselves.",
      "suffix": "",
      "review_action": "arg_fact",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "rylkjAtu2m",
      "sentence_index": 10,
      "text": "This form of noise injection at various levels of the generator is also close in spirit to what BigGAN employs, except that in the case of BigGAN different parts of the noise vector are used to influence different parts of the generator.",
      "suffix": "",
      "review_action": "arg_fact",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "rylkjAtu2m",
      "sentence_index": 11,
      "text": "Can you clarify how you view the relationship between the approaches mentioned above?",
      "suffix": "\n",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_clarification",
      "aspect": "asp_soundness-correctness",
      "polarity": "pol_negative"
    },
    {
      "review_id": "rylkjAtu2m",
      "sentence_index": 12,
      "text": "- It\u2019s interesting to me that the ResNet architecture performs better with self-modulation in all settings, considering that one possible explanation for why self-modulation is helpful is that it allows the \u201cinformation\u201d contained in the noise vector to better propagate to and influence different parts of the generator.",
      "suffix": "",
      "review_action": "arg_fact",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "rylkjAtu2m",
      "sentence_index": 13,
      "text": "ResNets also have this ability to \u201cpropagate\u201d the noise signal more easily, but it appears that having a self-modulation mechanism on top of that is still beneficial.",
      "suffix": "",
      "review_action": "arg_fact",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "rylkjAtu2m",
      "sentence_index": 14,
      "text": "I\u2019m curious to hear the authors\u2019 thoughts in this.",
      "suffix": "\n",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_clarification",
      "aspect": "asp_substance",
      "polarity": "pol_positive"
    },
    {
      "review_id": "rylkjAtu2m",
      "sentence_index": 15,
      "text": "- Reading Figure 2b, one could be tempted to draw a correlation between the complexity of the dataset and the gains achieved by self-modulation over the baseline (e.g., Bedroom shows less difference between the two approaches than ImageNet). Do the authors agree with that?",
      "suffix": "",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_experiment",
      "aspect": "asp_substance",
      "polarity": "pol_positive"
    }
  ],
  "rebuttal_sentences": [
    {
      "review_id": "rylkjAtu2m",
      "rebuttal_id": "rylaP-uP6m",
      "sentence_index": 0,
      "text": "We would like to thank the reviewer for the time and useful feedback.",
      "suffix": "",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_social",
      "alignment": [
        "context_global",
        null
      ],
      "details": {}
    },
    {
      "review_id": "rylkjAtu2m",
      "rebuttal_id": "rylaP-uP6m",
      "sentence_index": 1,
      "text": "Our response is given below.",
      "suffix": "\n\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_in-rebuttal",
        null
      ],
      "details": {}
    },
    {
      "review_id": "rylkjAtu2m",
      "rebuttal_id": "rylaP-uP6m",
      "sentence_index": 2,
      "text": "- Relationship to z-conditioning strategy in BigGAN.",
      "suffix": "\n\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_sentences",
        [
          10,
          11
        ]
      ],
      "details": {}
    },
    {
      "review_id": "rylkjAtu2m",
      "rebuttal_id": "rylaP-uP6m",
      "sentence_index": 3,
      "text": "Thanks for pointing out the connection to this concurrent submission.",
      "suffix": "",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_social",
      "alignment": [
        "context_sentences",
        [
          10,
          11
        ]
      ],
      "details": {}
    },
    {
      "review_id": "rylkjAtu2m",
      "rebuttal_id": "rylaP-uP6m",
      "sentence_index": 4,
      "text": "We will discuss the connections in the related work section.",
      "suffix": "",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_social",
      "alignment": [
        "context_sentences",
        [
          10,
          11
        ]
      ],
      "details": {}
    },
    {
      "review_id": "rylkjAtu2m",
      "rebuttal_id": "rylaP-uP6m",
      "sentence_index": 5,
      "text": "The main differences are as follows:",
      "suffix": "\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_sentences",
        [
          10,
          11
        ]
      ],
      "details": {}
    },
    {
      "review_id": "rylkjAtu2m",
      "rebuttal_id": "rylaP-uP6m",
      "sentence_index": 6,
      "text": "1. BigGAN performs conditional generation, whilst we primarily focus on unconditional generation.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          10,
          11
        ]
      ],
      "details": {}
    },
    {
      "review_id": "rylkjAtu2m",
      "rebuttal_id": "rylaP-uP6m",
      "sentence_index": 7,
      "text": "BigGAN splits the latent vector z and concatenates it with the label embedding, whereas we transform z using a small MLP per layer, which is arguably more powerful.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          10,
          11
        ]
      ],
      "details": {}
    },
    {
      "review_id": "rylkjAtu2m",
      "rebuttal_id": "rylaP-uP6m",
      "sentence_index": 8,
      "text": "In the conditional case, we apply both additive and multiplicative interaction between the label and z, instead of concatenation as in BigGAN.",
      "suffix": "\n",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          10,
          11
        ]
      ],
      "details": {}
    },
    {
      "review_id": "rylkjAtu2m",
      "rebuttal_id": "rylaP-uP6m",
      "sentence_index": 9,
      "text": "2. Overall BigGAN focusses on scalability to demonstrate that one can train an impressive model for conditional generation.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          10,
          11
        ]
      ],
      "details": {}
    },
    {
      "review_id": "rylkjAtu2m",
      "rebuttal_id": "rylaP-uP6m",
      "sentence_index": 10,
      "text": "Instead, we focus on a single idea, and show that it can be applied very broadly.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          10,
          11
        ]
      ],
      "details": {}
    },
    {
      "review_id": "rylkjAtu2m",
      "rebuttal_id": "rylaP-uP6m",
      "sentence_index": 11,
      "text": "We provide a thorough empirical evaluation across critical design decisions in GANs and demonstrate that it is a robust and practically useful contribution.",
      "suffix": "\n\n",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          10,
          11
        ]
      ],
      "details": {}
    },
    {
      "review_id": "rylkjAtu2m",
      "rebuttal_id": "rylaP-uP6m",
      "sentence_index": 12,
      "text": "- Propagation of signal and ResNets.",
      "suffix": "\n\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_sentences",
        [
          13,
          14
        ]
      ],
      "details": {}
    },
    {
      "review_id": "rylkjAtu2m",
      "rebuttal_id": "rylaP-uP6m",
      "sentence_index": 13,
      "text": "Indeed, ResNets provide a skip connection which helps signal propagation.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          13,
          14
        ]
      ],
      "details": {}
    },
    {
      "review_id": "rylkjAtu2m",
      "rebuttal_id": "rylaP-uP6m",
      "sentence_index": 14,
      "text": "Arguably, self-modulation has a similar effect.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          13,
          14
        ]
      ],
      "details": {}
    },
    {
      "review_id": "rylkjAtu2m",
      "rebuttal_id": "rylaP-uP6m",
      "sentence_index": 15,
      "text": "However, there are critical differences in these mechanisms which may explain the benefits of self-modulation in a resnet architecture:",
      "suffix": "\n",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          13,
          14
        ]
      ],
      "details": {}
    },
    {
      "review_id": "rylkjAtu2m",
      "rebuttal_id": "rylaP-uP6m",
      "sentence_index": 16,
      "text": "1. Self-modulation applies a channel-wise additive and multiplicative operation to each layer.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          13,
          14
        ]
      ],
      "details": {}
    },
    {
      "review_id": "rylkjAtu2m",
      "rebuttal_id": "rylaP-uP6m",
      "sentence_index": 17,
      "text": "In contrast, residual connections perform only an element-wise addition in the same spatial locality.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          13,
          14
        ]
      ],
      "details": {}
    },
    {
      "review_id": "rylkjAtu2m",
      "rebuttal_id": "rylaP-uP6m",
      "sentence_index": 18,
      "text": "As a result, channel-wise modulation allows trainable re-weighting of all feature maps, which is not the case for classic residual connections.",
      "suffix": "\n",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          13,
          14
        ]
      ],
      "details": {}
    },
    {
      "review_id": "rylkjAtu2m",
      "rebuttal_id": "rylaP-uP6m",
      "sentence_index": 19,
      "text": "2. The ResNet skip-connection is either an identity function or a learnable 1x1 convolution, both of which are linear.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          13,
          14
        ]
      ],
      "details": {}
    },
    {
      "review_id": "rylkjAtu2m",
      "rebuttal_id": "rylaP-uP6m",
      "sentence_index": 20,
      "text": "In self-modulation, the connection from z to each layer is a learnable non-linear function (MLP).",
      "suffix": "\n\n",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          13,
          14
        ]
      ],
      "details": {}
    },
    {
      "review_id": "rylkjAtu2m",
      "rebuttal_id": "rylaP-uP6m",
      "sentence_index": 21,
      "text": "- Reading Figure 2b, one could be tempted to draw a correlation between the complexity of the dataset and the gains achieved by self-modulation over the baseline (e.g., Bedroom shows less difference between the two approaches than ImageNet). Do the authors agree with that?",
      "suffix": "\n\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_sentences",
        [
          15
        ]
      ],
      "details": {}
    },
    {
      "review_id": "rylkjAtu2m",
      "rebuttal_id": "rylaP-uP6m",
      "sentence_index": 22,
      "text": "Yes, we notice more improvements on the harder, more diverse datasets.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          15
        ]
      ],
      "details": {}
    },
    {
      "review_id": "rylkjAtu2m",
      "rebuttal_id": "rylaP-uP6m",
      "sentence_index": 23,
      "text": "These datasets also have more headroom for improvement.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          15
        ]
      ],
      "details": {}
    }
  ]
}