{
  "metadata": {
    "forum_id": "r1e74a4twH",
    "review_id": "rJez-cJNcr",
    "rebuttal_id": "H1gi_8D8sB",
    "title": "CZ-GEM:  A  FRAMEWORK  FOR DISENTANGLED REPRESENTATION LEARNING",
    "reviewer": "AnonReviewer2",
    "rating": 3,
    "conference": "ICLR2020",
    "permalink": "https://openreview.net/forum?id=r1e74a4twH&noteId=H1gi_8D8sB",
    "annotator": "anno3"
  },
  "review_sentences": [
    {
      "review_id": "rJez-cJNcr",
      "sentence_index": 0,
      "text": "This paper proposes a method for learning disentangled representations.",
      "suffix": "",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "rJez-cJNcr",
      "sentence_index": 1,
      "text": "The approach is used on both supervised (where the factors to be disentangled are known) and unsupervised settings.",
      "suffix": "",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "rJez-cJNcr",
      "sentence_index": 2,
      "text": "The authors demonstrate the efficacy of their approach in both settings on several datasets with both quantitative and qualitative results.",
      "suffix": "\n\n",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "rJez-cJNcr",
      "sentence_index": 3,
      "text": "This task is an important one.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_motivation-impact",
      "polarity": "pol_positive"
    },
    {
      "review_id": "rJez-cJNcr",
      "sentence_index": 4,
      "text": "However, I found that the contribution of this paper is fairly small.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_substance",
      "polarity": "pol_negative"
    },
    {
      "review_id": "rJez-cJNcr",
      "sentence_index": 5,
      "text": "The proposed approach seems reasonable but it is mostly a work of engineering and provides little insights into the problem nor the proposed model.",
      "suffix": "\n\n",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_substance",
      "polarity": "pol_negative"
    },
    {
      "review_id": "rJez-cJNcr",
      "sentence_index": 6,
      "text": "The setup where labeled data (c) also seems a bit unnatural (this also seems to be confirmed by the fact that the authors had to build datasets for the problem).",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_soundness-correctness",
      "polarity": "pol_negative"
    },
    {
      "review_id": "rJez-cJNcr",
      "sentence_index": 7,
      "text": "Perhaps the authors could give examples of situations where this would naturally arise.",
      "suffix": "",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_experiment",
      "aspect": "asp_substance",
      "polarity": "pol_negative"
    },
    {
      "review_id": "rJez-cJNcr",
      "sentence_index": 8,
      "text": "In practice, it seems difficult to obtain these data for all required variables to be disentangled.",
      "suffix": "\n\n",
      "review_action": "arg_fact",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "rJez-cJNcr",
      "sentence_index": 9,
      "text": "The unsupervised results are more interesting but not very much explored (a single set of sampled faces).",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_substance",
      "polarity": "pol_negative"
    },
    {
      "review_id": "rJez-cJNcr",
      "sentence_index": 10,
      "text": "I was also curious as to why the learned Y's are blurry.",
      "suffix": "",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_clarification",
      "aspect": "asp_substance",
      "polarity": "pol_negative"
    },
    {
      "review_id": "rJez-cJNcr",
      "sentence_index": 11,
      "text": "This sort of two-stage generation is also potentially interesting, I was wondering if the authors had ideas to generalize this idea.",
      "suffix": "\n\n",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_experiment",
      "aspect": "asp_replicability",
      "polarity": "pol_negative"
    },
    {
      "review_id": "rJez-cJNcr",
      "sentence_index": 12,
      "text": "I also was not convinced by the experiments which are mostly qualitative. I did not find that this set of experiments provide enough support to the proposed method.",
      "suffix": "\n\n\n",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_substance",
      "polarity": "pol_negative"
    },
    {
      "review_id": "rJez-cJNcr",
      "sentence_index": 13,
      "text": "Detailed comments:",
      "suffix": "\n",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_heading",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "rJez-cJNcr",
      "sentence_index": 14,
      "text": "- It is a bit unclear to me how the authors propose to obtain independent posteriors over z and c. Is it purely empirical or is there a formal reason that guarantees it?",
      "suffix": "\n\n",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_explanation",
      "aspect": "asp_substance",
      "polarity": "pol_negative"
    },
    {
      "review_id": "rJez-cJNcr",
      "sentence_index": 15,
      "text": "- Some of the figures your report are compelling but it is a bit unclear to the reader if the results are general (e.g., the examples could have been hand-picked). Are there any quantitative measures you could provide (in addition to Tables 1 and 2 which don't measure the quality of the approach)?",
      "suffix": "\n\n",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_clarity",
      "polarity": "pol_negative"
    },
    {
      "review_id": "rJez-cJNcr",
      "sentence_index": 16,
      "text": "- Comparing to CGAN seems reasonable but given the task at hand, it seems like other methods could have been tried (although I do realize that no one may have done this before for deep generative models).",
      "suffix": "\n\n\n\n",
      "review_action": "arg_fact",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "rJez-cJNcr",
      "sentence_index": 17,
      "text": "Other comments:",
      "suffix": "\n",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_heading",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "rJez-cJNcr",
      "sentence_index": 18,
      "text": "- In Figure 3, it would be good to label the upper trapezoid.",
      "suffix": "\n\n",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_edit",
      "aspect": "asp_substance",
      "polarity": "none"
    },
    {
      "review_id": "rJez-cJNcr",
      "sentence_index": 19,
      "text": "- Some paragraphs are very long and the manuscript may benefit from segmenting them into multiple paragraphs.",
      "suffix": "",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_edit",
      "aspect": "asp_substance",
      "polarity": "none"
    }
  ],
  "rebuttal_sentences": [
    {
      "review_id": "rJez-cJNcr",
      "rebuttal_id": "H1gi_8D8sB",
      "sentence_index": 0,
      "text": "We would like to start by clarifying the difference between the final implementation (what the reviewer referred to as engineering contribution) of our method with its scientific contribution.",
      "suffix": "\n\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_sentences",
        [
          4,
          5
        ]
      ],
      "details": {}
    },
    {
      "review_id": "rJez-cJNcr",
      "rebuttal_id": "H1gi_8D8sB",
      "sentence_index": 1,
      "text": "1. Scientific Contribution: Most recent work on disentangling generative modelling tries to obtain an independent/factorised posterior over the latent generative factors without directly addressing the problem of d-separation, which theoretically prohibits factorisation of the posterior in models such as beta-VAE, conditional GAN or stack GAN.",
      "suffix": "\n",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          4,
          5
        ]
      ],
      "details": {}
    },
    {
      "review_id": "rJez-cJNcr",
      "rebuttal_id": "H1gi_8D8sB",
      "sentence_index": 2,
      "text": "To further elaborate, due to d-separation, models from prior work that have the same underlying plate notation either fail to disentangle the representations (since $p(c,z|x) \\neq p(c|x)p(z|x)$ ) ) or do so at the cost of lower generative quality\u2014-because their training relies on having an additive information-theoretic penalty term.",
      "suffix": "\n",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          4,
          5
        ]
      ],
      "details": {}
    },
    {
      "review_id": "rJez-cJNcr",
      "rebuttal_id": "H1gi_8D8sB",
      "sentence_index": 3,
      "text": "Our method, on the other hand, decouples the problem of learning disentangled latent representations and high fidelity generation into two separate problems by introducing a hierarchical structure (sub-graph c-y) that is trained separate from the rest of the model.",
      "suffix": "",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_reject-criticism",
      "alignment": [
        "context_sentences",
        [
          4,
          5
        ]
      ],
      "details": {}
    },
    {
      "review_id": "rJez-cJNcr",
      "rebuttal_id": "H1gi_8D8sB",
      "sentence_index": 4,
      "text": "This allows obtaining a posterior $p(c|y)p(z|x,y)p(y|x)$, which in fact guarantees the disentanglement of the factors c from z while preserving the generative strength of the model.",
      "suffix": "\n\n",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_reject-criticism",
      "alignment": [
        "context_sentences",
        [
          4,
          5
        ]
      ],
      "details": {}
    },
    {
      "review_id": "rJez-cJNcr",
      "rebuttal_id": "H1gi_8D8sB",
      "sentence_index": 5,
      "text": "2. Supervised Setting: We would argue that the setting where labelled data (C) is available is more natural than the unsupervised setting as we aim to learn physical simulators (such as graphics engines) that have a well-defined control variate structure.",
      "suffix": "",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_mitigate-criticism",
      "alignment": [
        "context_sentences",
        [
          1,
          6,
          7,
          8
        ]
      ],
      "details": {}
    },
    {
      "review_id": "rJez-cJNcr",
      "rebuttal_id": "H1gi_8D8sB",
      "sentence_index": 6,
      "text": "This setup appears in many previous works, e.g. conditional GAN and its derivatives.",
      "suffix": "",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_mitigate-criticism",
      "alignment": [
        "context_sentences",
        [
          1,
          6,
          7,
          8
        ]
      ],
      "details": {}
    },
    {
      "review_id": "rJez-cJNcr",
      "rebuttal_id": "H1gi_8D8sB",
      "sentence_index": 7,
      "text": "Testing such models on synthetic datasets (i.e. outputs of graphics engines) where one can control the generative variables is a standard practice in the field and allows for better testing.",
      "suffix": "\n\n",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_mitigate-criticism",
      "alignment": [
        "context_sentences",
        [
          1,
          6,
          7,
          8
        ]
      ],
      "details": {}
    },
    {
      "review_id": "rJez-cJNcr",
      "rebuttal_id": "H1gi_8D8sB",
      "sentence_index": 8,
      "text": "3. Unsupervised results: For the unsupervised setting, in addition to our face dataset and CelebA, we also present the results on the chairs and cars in the Appendix (See Figure 5, Figure 7, Figure 12, Figure 13).",
      "suffix": "\n\n",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          9,
          10,
          11
        ]
      ],
      "details": {}
    },
    {
      "review_id": "rJez-cJNcr",
      "rebuttal_id": "H1gi_8D8sB",
      "sentence_index": 9,
      "text": "4. Experiments: We would argue that our qualitative plots and quantitative metrics are in line with the evaluation used in current SOTA work.",
      "suffix": "",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_reject-criticism",
      "alignment": [
        "context_sentences",
        [
          12
        ]
      ],
      "details": {}
    },
    {
      "review_id": "rJez-cJNcr",
      "rebuttal_id": "H1gi_8D8sB",
      "sentence_index": 10,
      "text": "In fact, we provide a very thorough mix of quantitative and qualitative experiments for both supervised and unsupervised settings.",
      "suffix": "",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_reject-criticism",
      "alignment": [
        "context_sentences",
        [
          12
        ]
      ],
      "details": {}
    },
    {
      "review_id": "rJez-cJNcr",
      "rebuttal_id": "H1gi_8D8sB",
      "sentence_index": 11,
      "text": "We would like to point out that there are no accepted measures in the field for the quality of learned disentangled representation (see Locatello et. al. [https://arxiv.org/pdf/1811.12359.pdf](https://arxiv.org/pdf/1811.12359.pdf)) and most previous papers in the field include a similar mix of quantitative and qualitative results in their experiments section.",
      "suffix": "",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_reject-criticism",
      "alignment": [
        "context_sentences",
        [
          12
        ]
      ],
      "details": {}
    },
    {
      "review_id": "rJez-cJNcr",
      "rebuttal_id": "H1gi_8D8sB",
      "sentence_index": 12,
      "text": "Also, we provide all the code so that it can be verified that the reported results are not cherry-picked.",
      "suffix": "",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_reject-criticism",
      "alignment": [
        "context_sentences",
        [
          12
        ]
      ],
      "details": {}
    }
  ]
}