{
  "metadata": {
    "forum_id": "S1ecYANtPr",
    "review_id": "Skg1tSgV5S",
    "rebuttal_id": "Hygpmj1EiH",
    "title": "Representation Learning Through Latent Canonicalizations",
    "reviewer": "AnonReviewer4",
    "rating": 3,
    "conference": "ICLR2020",
    "permalink": "https://openreview.net/forum?id=S1ecYANtPr&noteId=Hygpmj1EiH",
    "annotator": "anno10"
  },
  "review_sentences": [
    {
      "review_id": "Skg1tSgV5S",
      "sentence_index": 0,
      "text": "This paper proposes to relax the assumption of disentangled representation and encourage the model to learn linearly manipulable representations.",
      "suffix": "\n",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "Skg1tSgV5S",
      "sentence_index": 1,
      "text": "The paper assumes that the latent canonicalizers are predefined for each task and that it is possible to obtain the ground-truth image of different latent canonicalizations.",
      "suffix": "",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "Skg1tSgV5S",
      "sentence_index": 2,
      "text": "I find these assumptions too strong for the task of learning disentangled representation.",
      "suffix": "\n",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_soundness-correctness",
      "polarity": "pol_negative"
    },
    {
      "review_id": "Skg1tSgV5S",
      "sentence_index": 3,
      "text": "Firstly, most prior works such as beta-vae, info-gan do not assume that the factors / canonicalizers are known beforehand.",
      "suffix": "",
      "review_action": "arg_fact",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "Skg1tSgV5S",
      "sentence_index": 4,
      "text": "In fact, this is a very difficult part of learning disentangled representation.",
      "suffix": "",
      "review_action": "arg_fact",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "Skg1tSgV5S",
      "sentence_index": 5,
      "text": "Secondly, if it is possible to obtain the ground-truth image of different latent canonicalizations, you can simply train a network to predict the canonicalizations by simple supervised learning.",
      "suffix": "",
      "review_action": "arg_fact",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "Skg1tSgV5S",
      "sentence_index": 6,
      "text": "Hence, these overly simplified and unrealistic assumptions make the task too trivial.",
      "suffix": "\n",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_soundness-correctness",
      "polarity": "pol_negative"
    },
    {
      "review_id": "Skg1tSgV5S",
      "sentence_index": 7,
      "text": "The proposed method is very simple and frames the problem basically as a supervised learning problem.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_soundness-correctness",
      "polarity": "pol_negative"
    },
    {
      "review_id": "Skg1tSgV5S",
      "sentence_index": 8,
      "text": "Although experiments show that learning such representations are beneficial for low-shot setting of SVHN, it is not clear whether such improvement generalizes to more realistic datasets such as ImageNet.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_soundness-correctness",
      "polarity": "pol_negative"
    },
    {
      "review_id": "Skg1tSgV5S",
      "sentence_index": 9,
      "text": "If the goal is to learn representation for low-shot setting, the method needs to be compared with other representation learning methods such as jigsaw[1], colorization[2] and rotation[3].",
      "suffix": "\n\n",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_experiment",
      "aspect": "asp_substance",
      "polarity": "pol_negative"
    },
    {
      "review_id": "Skg1tSgV5S",
      "sentence_index": 10,
      "text": "[1] Noroozi, Mehdi, and Paolo Favaro. \"Unsupervised learning of visual representations by solving jigsaw puzzles.\" European Conference on Computer Vision. Springer, Cham, 2016.",
      "suffix": "\n",
      "review_action": "arg_other",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "Skg1tSgV5S",
      "sentence_index": 11,
      "text": "[2] Zhang, Richard, Phillip Isola, and Alexei A. Efros. \"Colorful image colorization.\" European conference on computer vision. Springer, Cham, 2016.",
      "suffix": "\n",
      "review_action": "arg_other",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "Skg1tSgV5S",
      "sentence_index": 12,
      "text": "[3] Gidaris, Spyros, Praveer Singh, and Nikos Komodakis. \"Unsupervised representation learning by predicting image rotations.\" arXiv preprint arXiv:1803.07728 (2018).",
      "suffix": "",
      "review_action": "arg_other",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    }
  ],
  "rebuttal_sentences": [
    {
      "review_id": "Skg1tSgV5S",
      "rebuttal_id": "Hygpmj1EiH",
      "sentence_index": 0,
      "text": "Q: \"...I find these assumptions too strong for the task of learning disentangled representation.\"",
      "suffix": "\n\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_sentences",
        [
          1,
          2
        ]
      ],
      "details": {}
    },
    {
      "review_id": "Skg1tSgV5S",
      "rebuttal_id": "Hygpmj1EiH",
      "sentence_index": 1,
      "text": "A: We wish to emphasize that:",
      "suffix": "\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_sentences",
        [
          1,
          2
        ]
      ],
      "details": {}
    },
    {
      "review_id": "Skg1tSgV5S",
      "rebuttal_id": "Hygpmj1EiH",
      "sentence_index": 2,
      "text": "(1) we only require access to meta-labels on the source set",
      "suffix": "\n",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_reject-criticism",
      "alignment": [
        "context_sentences",
        [
          1,
          2
        ]
      ],
      "details": {}
    },
    {
      "review_id": "Skg1tSgV5S",
      "rebuttal_id": "Hygpmj1EiH",
      "sentence_index": 3,
      "text": "(2) our goal is not to find disentangled representations; Our goal is transferability so that we can learn on real data with minimal supervision.",
      "suffix": "\n\n\n",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_reject-criticism",
      "alignment": [
        "context_sentences",
        [
          1,
          2
        ]
      ],
      "details": {}
    },
    {
      "review_id": "Skg1tSgV5S",
      "rebuttal_id": "Hygpmj1EiH",
      "sentence_index": 4,
      "text": "Q: \"you can simply train a network to predict the canonicalizations by simple supervised learning\"",
      "suffix": "\n\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_sentences",
        [
          3,
          4,
          5
        ]
      ],
      "details": {}
    },
    {
      "review_id": "Skg1tSgV5S",
      "rebuttal_id": "Hygpmj1EiH",
      "sentence_index": 5,
      "text": "A: Finding the canonicaliers is not our goal.",
      "suffix": "",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_reject-criticism",
      "alignment": [
        "context_sentences",
        [
          3,
          4,
          5
        ]
      ],
      "details": {}
    },
    {
      "review_id": "Skg1tSgV5S",
      "rebuttal_id": "Hygpmj1EiH",
      "sentence_index": 6,
      "text": "These are used as auxiliary constraints which guide representation learning and allow for latent data augmentation on our limited target data.",
      "suffix": "",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_reject-criticism",
      "alignment": [
        "context_sentences",
        [
          3,
          4,
          5
        ]
      ],
      "details": {}
    },
    {
      "review_id": "Skg1tSgV5S",
      "rebuttal_id": "Hygpmj1EiH",
      "sentence_index": 7,
      "text": "Specifically, predicting the factor values will not help us in manipulating the target samples (note the significant improvement we get by using the majority vote).",
      "suffix": "\n\n",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_reject-criticism",
      "alignment": [
        "context_sentences",
        [
          3,
          4,
          5
        ]
      ],
      "details": {}
    },
    {
      "review_id": "Skg1tSgV5S",
      "rebuttal_id": "Hygpmj1EiH",
      "sentence_index": 8,
      "text": "Q: method needs to be compared with other representation learning methods such as jigsaw[1], colorization[2] and rotation[3].",
      "suffix": "\n\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_sentences",
        [
          9
        ]
      ],
      "details": {}
    },
    {
      "review_id": "Skg1tSgV5S",
      "rebuttal_id": "Hygpmj1EiH",
      "sentence_index": 9,
      "text": "A: The suggested methods [1-3] are self-supervised methods using less information than the baselines we have included (for example, the AE+classifier baseline is trained on the same synthetic data with the same access to digit labels).",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          9
        ]
      ],
      "details": {}
    },
    {
      "review_id": "Skg1tSgV5S",
      "rebuttal_id": "Hygpmj1EiH",
      "sentence_index": 10,
      "text": "We therefore expect that these methods will perform worse than our baselines.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          9
        ]
      ],
      "details": {}
    },
    {
      "review_id": "Skg1tSgV5S",
      "rebuttal_id": "Hygpmj1EiH",
      "sentence_index": 11,
      "text": "We have included  a comparison with method [3] (see general comment to all reviewers).",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_done",
      "alignment": [
        "context_sentences",
        [
          9
        ]
      ],
      "details": {
        "request_out_of_scope": true
      }
    }
  ]
}