{
  "metadata": {
    "forum_id": "S1gd7nCcF7",
    "review_id": "Bkxpr4aq3m",
    "rebuttal_id": "BygTJNKtRQ",
    "title": "Self-Supervised Generalisation with Meta Auxiliary Learning",
    "reviewer": "AnonReviewer3",
    "rating": 4,
    "conference": "ICLR2019",
    "permalink": "https://openreview.net/forum?id=S1gd7nCcF7&noteId=BygTJNKtRQ",
    "annotator": "anno3"
  },
  "review_sentences": [
    {
      "review_id": "Bkxpr4aq3m",
      "sentence_index": 0,
      "text": "This paper proposes a self-auxiliary-training method that aims to improve the generalization performance of simple supervised learning.",
      "suffix": "",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "Bkxpr4aq3m",
      "sentence_index": 1,
      "text": "The basic idea is to train the classification network to predict fine-level auxiliary labels in addition to the ground-truth coarse label, where the auxiliary labels used in training is generated by a generator network.",
      "suffix": "",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "Bkxpr4aq3m",
      "sentence_index": 2,
      "text": "During training, the classification network and the generator network are alternatively updated, and the update of the latter aims to maximize the improvement of the former after using the generated auxiliary label for training.",
      "suffix": "",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "Bkxpr4aq3m",
      "sentence_index": 3,
      "text": "The method requires a class hierarchy in advance to define the binary mask applied to the output layer for auxiliary class prediction.",
      "suffix": "",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "Bkxpr4aq3m",
      "sentence_index": 4,
      "text": "A KL divergence term is attached to the optimization objective to avoid generating trivial and collapsing auxiliary classes.",
      "suffix": "\n\n",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "Bkxpr4aq3m",
      "sentence_index": 5,
      "text": "Pros:",
      "suffix": "\n\n",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_heading",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "Bkxpr4aq3m",
      "sentence_index": 6,
      "text": "1) The main idea is simple and easy to understand.",
      "suffix": "\n",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_clarity",
      "polarity": "pol_positive"
    },
    {
      "review_id": "Bkxpr4aq3m",
      "sentence_index": 7,
      "text": "2) It discusses the class collapsing problem in generating pseudo (auxiliary) labels and provides a reasonable solution, i.e., using KL divergence as regularization.",
      "suffix": "\n",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_soundness-correctness",
      "polarity": "pol_positive"
    },
    {
      "review_id": "Bkxpr4aq3m",
      "sentence_index": 8,
      "text": "3) Uses several visualizations to show experimental results.",
      "suffix": "\n\n",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_substance",
      "polarity": "pol_positive"
    },
    {
      "review_id": "Bkxpr4aq3m",
      "sentence_index": 9,
      "text": "Cons:",
      "suffix": "\n\n",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_heading",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "Bkxpr4aq3m",
      "sentence_index": 10,
      "text": "1) The problem it aims to solve is neither multi-task learning nor meta-learning: it tries to solve a supervised classification problem defined on principle classes, with the help of simultaneously predicting/generating auxiliary class labels.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_soundness-correctness",
      "polarity": "pol_negative"
    },
    {
      "review_id": "Bkxpr4aq3m",
      "sentence_index": 11,
      "text": "Although the concept of \"task\" is not explicitly defined in this paper, the authors seem to associate each task with a specific class.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_substance",
      "polarity": "pol_negative"
    },
    {
      "review_id": "Bkxpr4aq3m",
      "sentence_index": 12,
      "text": "This is not correct: in meta-learning, each task is a subset of classes drawn from a ground set of classes, and different tasks are independently sampled.",
      "suffix": "",
      "review_action": "none",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "Bkxpr4aq3m",
      "sentence_index": 13,
      "text": "In addition, the classification models for different tasks are independent, though their training might be related by a meta-learner.",
      "suffix": "",
      "review_action": "arg_fact",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "Bkxpr4aq3m",
      "sentence_index": 14,
      "text": "Hence, the claims in multiple places of this paper and the names for the two networks are misleading.",
      "suffix": "\n\n",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_clarity",
      "polarity": "pol_negative"
    },
    {
      "review_id": "Bkxpr4aq3m",
      "sentence_index": 15,
      "text": "2) At the end of Page 4, the authors show that the update of the generator only depends on the improvement of the classifier after using the auxiliary label for training.",
      "suffix": "",
      "review_action": "arg_fact",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "Bkxpr4aq3m",
      "sentence_index": 16,
      "text": "In fact, the optimal auxiliary labels minimizing the objective is the ground truth label for principle classes.",
      "suffix": "",
      "review_action": "arg_fact",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "Bkxpr4aq3m",
      "sentence_index": 17,
      "text": "This results in the class collapsing problem observed by the authors.",
      "suffix": "",
      "review_action": "arg_fact",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "Bkxpr4aq3m",
      "sentence_index": 18,
      "text": "The KL divergence regularization introduces extra randomness to the auxiliary labels and thus mitigates the problem, but it hardly provides any useful information except randomness.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_substance",
      "polarity": "pol_negative"
    },
    {
      "review_id": "Bkxpr4aq3m",
      "sentence_index": 19,
      "text": "In other words, the auxiliary labels for a specific principle class are very possible to be multiple noisy copies of the principal label with random perturbations.",
      "suffix": "",
      "review_action": "none",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "Bkxpr4aq3m",
      "sentence_index": 20,
      "text": "So it is not convincing to me that the auxiliary labels generated by the generator can be really helpful.",
      "suffix": "",
      "review_action": "none",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "Bkxpr4aq3m",
      "sentence_index": 21,
      "text": "My conjecture is that the observed improvements are mainly due to the softness of the auxiliary labels, which has been proved by model compression/knowledge distillation and recent \"born-again neural networks\".",
      "suffix": "",
      "review_action": "arg_fact",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "Bkxpr4aq3m",
      "sentence_index": 22,
      "text": "To verify this, the authors might need to compare the results with those methods (which use the generated soft probability of ground truth classes for training), and the \"random-noisy copies of soft principle label\" mentioned above.",
      "suffix": "\n\n",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_result",
      "aspect": "asp_substance",
      "polarity": "none"
    },
    {
      "review_id": "Bkxpr4aq3m",
      "sentence_index": 23,
      "text": "3) The experiments lack comparisons to several important baselines from self-supervised learning community, and methods using soft labels for training (as mentioned in 2) above).",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_meaningful-comparison",
      "polarity": "pol_negative"
    },
    {
      "review_id": "Bkxpr4aq3m",
      "sentence_index": 24,
      "text": "A successful idea of self-supervised learning is to use the output feature map of the trained classification network to generate auxiliary training signals, since it provides extra information about the learned distance beyond the ground-truth labels.",
      "suffix": "",
      "review_action": "arg_fact",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "Bkxpr4aq3m",
      "sentence_index": 25,
      "text": "The authors might want to compare to \"Mathilde Caron, Piotr Bojanowski, Armand Joulin, and Matthijs Douze.",
      "suffix": "",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_result",
      "aspect": "asp_meaningful-comparison",
      "polarity": "none"
    },
    {
      "review_id": "Bkxpr4aq3m",
      "sentence_index": 26,
      "text": "Deep Clustering for Unsupervised Learning of Visual Features. ECCV 2018.\" and \"Carl Doersch and Andrew Zisserman. Multi-task self-supervised visual learning. ICCV 2017.\" Moreover, since the method is not a meta-learning approach for few-shot learning, it is not fair and also not appropriate to compare with Prototypical Network.",
      "suffix": "\n\n",
      "review_action": "arg_other",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "Bkxpr4aq3m",
      "sentence_index": 27,
      "text": "4) Although the paper claims that the ground truth fine labels are not required, it requires a class hierarchy, which in the experiments are provided by the dataset and defined between true coarse and fine classes.",
      "suffix": "",
      "review_action": "arg_fact",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "Bkxpr4aq3m",
      "sentence_index": 28,
      "text": "In practice, such hierarchy might be much harder to achieve than the primary (coarse) labels, and might be as costly to obtain as the true fine-class labels.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_substance",
      "polarity": "pol_negative"
    },
    {
      "review_id": "Bkxpr4aq3m",
      "sentence_index": 29,
      "text": "This weakens the feasibility of the proposed method.",
      "suffix": "\n\n",
      "review_action": "none",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "Bkxpr4aq3m",
      "sentence_index": 30,
      "text": "5) The experiments only test the proposed method on CIFAR100 and CIFAR10, which has at most 100 fine classes.",
      "suffix": "",
      "review_action": "arg_fact",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "Bkxpr4aq3m",
      "sentence_index": 31,
      "text": "It is necessary to test it on datasets with much more fine classes and much-complicated hierarchy, e.g., ImageNet, MS COCO or their subsets, which have ideal class hierarchy structures.",
      "suffix": "\n\n",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_replicability",
      "polarity": "pol_negative"
    },
    {
      "review_id": "Bkxpr4aq3m",
      "sentence_index": 32,
      "text": "Minor comments:",
      "suffix": "\n\n",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_heading",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "Bkxpr4aq3m",
      "sentence_index": 33,
      "text": "Some important equations in the paper should be numbered.",
      "suffix": "",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_typo",
      "aspect": "asp_substance",
      "polarity": "none"
    }
  ],
  "rebuttal_sentences": [
    {
      "review_id": "Bkxpr4aq3m",
      "rebuttal_id": "BygTJNKtRQ",
      "sentence_index": 0,
      "text": "We thank for the reviewer for their comments on our work, and we share our responses below.",
      "suffix": "\n\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_social",
      "alignment": [
        "context_global",
        null
      ],
      "details": {}
    },
    {
      "review_id": "Bkxpr4aq3m",
      "rebuttal_id": "BygTJNKtRQ",
      "sentence_index": 1,
      "text": "1) We agree that we did not provide a clear definition of \"task\".",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_concede-criticism",
      "alignment": [
        "context_sentences",
        [
          11,
          12
        ]
      ],
      "details": {}
    },
    {
      "review_id": "Bkxpr4aq3m",
      "rebuttal_id": "BygTJNKtRQ",
      "sentence_index": 2,
      "text": "In the present paper there are two tasks: classification into primary labels, and classification into secondary labels.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          11,
          12
        ]
      ],
      "details": {}
    },
    {
      "review_id": "Bkxpr4aq3m",
      "rebuttal_id": "BygTJNKtRQ",
      "sentence_index": 3,
      "text": "We did not mean to imply that the classification of a specific class is a task on its own.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          10,
          11,
          12,
          13,
          14
        ]
      ],
      "details": {}
    },
    {
      "review_id": "Bkxpr4aq3m",
      "rebuttal_id": "BygTJNKtRQ",
      "sentence_index": 4,
      "text": "We agree however that a clearer introduction of the terminology would be clearly helpful and we plan to add this to the final submission.",
      "suffix": "\n\n",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_by-cr",
      "alignment": [
        "context_sentences",
        [
          10,
          11,
          12,
          13,
          14
        ]
      ],
      "details": {
        "manuscript_change": true
      }
    },
    {
      "review_id": "Bkxpr4aq3m",
      "rebuttal_id": "BygTJNKtRQ",
      "sentence_index": 5,
      "text": "2) This comment is not entirely correct and we would like to apologies for any confusion in the paper.",
      "suffix": "",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_reject-criticism",
      "alignment": [
        "context_sentences",
        [
          15,
          16,
          17,
          18,
          19,
          20,
          21,
          22
        ]
      ],
      "details": {}
    },
    {
      "review_id": "Bkxpr4aq3m",
      "rebuttal_id": "BygTJNKtRQ",
      "sentence_index": 6,
      "text": "Actually, the update of the generator depends on the improvement of the classifier for the *principal* labels on the *meta-training* data, i.e. the improvement in generalisation to unseen data.",
      "suffix": "",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_reject-criticism",
      "alignment": [
        "context_sentences",
        [
          15,
          16,
          17,
          18,
          19,
          20,
          21,
          22
        ]
      ],
      "details": {}
    },
    {
      "review_id": "Bkxpr4aq3m",
      "rebuttal_id": "BygTJNKtRQ",
      "sentence_index": 7,
      "text": "Thus, the optimal auxiliary labels are not the ground-truth labels for the principal classes, since this would make both terms in the minimisation for $\\theta_1$ (the second equation in 3.2) identical and not allow any leveraging of the meta-training data.",
      "suffix": "",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_reject-criticism",
      "alignment": [
        "context_sentences",
        [
          15,
          16,
          17,
          18,
          19,
          20,
          21,
          22
        ]
      ],
      "details": {}
    },
    {
      "review_id": "Bkxpr4aq3m",
      "rebuttal_id": "BygTJNKtRQ",
      "sentence_index": 8,
      "text": "Also, we would argue that the KL-divergence, rather then introducing noise, allows us to avoid collapsing classes which we would claim are due to dying neurons (again, there is not loss/mechanism drawing the auxiliary labels to be the same as the primary ones).",
      "suffix": "",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_reject-criticism",
      "alignment": [
        "context_sentences",
        [
          15,
          16,
          17,
          18,
          19,
          20,
          21,
          22
        ]
      ],
      "details": {}
    },
    {
      "review_id": "Bkxpr4aq3m",
      "rebuttal_id": "BygTJNKtRQ",
      "sentence_index": 9,
      "text": "These claims are supported by showing that providing random labels does not lead to any improved performance and by our experience that using hard labels does indeed improve performance.",
      "suffix": "\n\n",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          15,
          16,
          17,
          18,
          19,
          20,
          21,
          22
        ]
      ],
      "details": {}
    },
    {
      "review_id": "Bkxpr4aq3m",
      "rebuttal_id": "BygTJNKtRQ",
      "sentence_index": 10,
      "text": "3) Providing fair comparisons across a range of very different methods is not easy when other methods aim to solve a different problem.",
      "suffix": "",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_reject-criticism",
      "alignment": [
        "context_sentences",
        [
          23,
          24,
          25,
          26
        ]
      ],
      "details": {}
    },
    {
      "review_id": "Bkxpr4aq3m",
      "rebuttal_id": "BygTJNKtRQ",
      "sentence_index": 11,
      "text": "Concerning the comparison with prototypical networks, we do agree that this is not a fair comparison and we would like to change the phrasing in the paper.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_concede-criticism",
      "alignment": [
        "context_sentences",
        [
          23,
          24,
          25,
          26
        ]
      ],
      "details": {}
    },
    {
      "review_id": "Bkxpr4aq3m",
      "rebuttal_id": "BygTJNKtRQ",
      "sentence_index": 12,
      "text": "The original reason for associating this to the prototypical network was that we employ their zero-shot setup: i.e. we use a VGG network to obtain prototypes on the meta-data and then use these prototypes to define an auxiliary task on the training-data.",
      "suffix": "\n\n",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          23,
          24,
          25,
          26
        ]
      ],
      "details": {}
    },
    {
      "review_id": "Bkxpr4aq3m",
      "rebuttal_id": "BygTJNKtRQ",
      "sentence_index": 13,
      "text": "4) We do agree that requiring the class hierarchy is a current limitation of the work.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_concede-criticism",
      "alignment": [
        "context_sentences",
        [
          27,
          28,
          29
        ]
      ],
      "details": {}
    },
    {
      "review_id": "Bkxpr4aq3m",
      "rebuttal_id": "BygTJNKtRQ",
      "sentence_index": 14,
      "text": "While it is still general enough for solving classification tasks (we merely have to choose a fixed number of sub-classes per task, e.g. 5 without having to provide anything else), we would want to look at more general auxiliary task in future.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_future",
      "alignment": [
        "context_sentences",
        [
          27,
          28,
          29
        ]
      ],
      "details": {}
    },
    {
      "review_id": "Bkxpr4aq3m",
      "rebuttal_id": "BygTJNKtRQ",
      "sentence_index": 15,
      "text": "One option we are considering is employing an auxiliary regression task, where the generator network would provide vectors and the corresponding loss would be simple regression.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          27,
          28,
          29
        ]
      ],
      "details": {}
    },
    {
      "review_id": "Bkxpr4aq3m",
      "rebuttal_id": "BygTJNKtRQ",
      "sentence_index": 16,
      "text": "However, since this is the first work to use a double gradient method for auxiliary task generation, we believe that presenting results with a comparison to human auxiliary labels, which itself also requires this hierarchy, is a good starting point.",
      "suffix": "\n\n",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          27,
          28,
          29
        ]
      ],
      "details": {}
    },
    {
      "review_id": "Bkxpr4aq3m",
      "rebuttal_id": "BygTJNKtRQ",
      "sentence_index": 17,
      "text": "5) We would very much like to test our approach on more complex datasets with more varied classes, and this will be part of future work.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_future",
      "alignment": [
        "context_sentences",
        [
          30,
          31
        ]
      ],
      "details": {}
    },
    {
      "review_id": "Bkxpr4aq3m",
      "rebuttal_id": "BygTJNKtRQ",
      "sentence_index": 18,
      "text": "However, we would like to repeat that our approach can work with an arbitrary hierarchy (e.g. assigning the same number of sub-classes to every class).",
      "suffix": "",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_mitigate-criticism",
      "alignment": [
        "context_sentences",
        [
          30,
          31
        ]
      ],
      "details": {}
    },
    {
      "review_id": "Bkxpr4aq3m",
      "rebuttal_id": "BygTJNKtRQ",
      "sentence_index": 19,
      "text": "The reason why we only used 100 classes in our experiments is for allowing the comparison with human-defined classes, but in principle we could use any number of sub-classes per primary class.",
      "suffix": "",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_mitigate-criticism",
      "alignment": [
        "context_sentences",
        [
          30,
          31
        ]
      ],
      "details": {}
    },
    {
      "review_id": "Bkxpr4aq3m",
      "rebuttal_id": "BygTJNKtRQ",
      "sentence_index": 20,
      "text": "In the CIFAR10 dataset in which a hierarchy is not defined, we show that using 6 different hierarchies all lead to a better generalisation.",
      "suffix": "",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_mitigate-criticism",
      "alignment": [
        "context_sentences",
        [
          30,
          31
        ]
      ],
      "details": {}
    }
  ]
}