{
  "metadata": {
    "forum_id": "rJehNT4YPr",
    "review_id": "HkgKMIsTFr",
    "rebuttal_id": "ryePhK-giS",
    "title": "I Am Going MAD: Maximum Discrepancy Competition for Comparing Classifiers Adaptively",
    "reviewer": "AnonReviewer2",
    "rating": 8,
    "conference": "ICLR2020",
    "permalink": "https://openreview.net/forum?id=rJehNT4YPr&noteId=ryePhK-giS",
    "annotator": "anno2"
  },
  "review_sentences": [
    {
      "review_id": "HkgKMIsTFr",
      "sentence_index": 0,
      "text": "The paper proposed a novel image classifier comparison approach that went beyond one fixed testing set for all.",
      "suffix": "",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "HkgKMIsTFr",
      "sentence_index": 1,
      "text": "Instead, for a pair of classifiers to be compared, it advocated to sample their \"most disagreed\" test set from a large corpus of unlabeled images.",
      "suffix": "",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "HkgKMIsTFr",
      "sentence_index": 2,
      "text": "The level of disagreement was measured by a semantic-aware distance derived from WordNet ontology.",
      "suffix": "",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "HkgKMIsTFr",
      "sentence_index": 3,
      "text": "Because of the efficacy of such \"worst-case\" comparison, the needed set size is very small and thus minimizes the human annotation workload.",
      "suffix": "\n\n",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "HkgKMIsTFr",
      "sentence_index": 4,
      "text": "The proposed MAD competition distinguishes classifiers by finding their respective counterexamples.",
      "suffix": "",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "HkgKMIsTFr",
      "sentence_index": 5,
      "text": "It is therefore an \"error spotting\" mechanism, rather than a drop-in replacement of standard test accuracy.",
      "suffix": "",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "HkgKMIsTFr",
      "sentence_index": 6,
      "text": "I feel the approach to implicitly assume that the classifiers to be compared are already \"reasonably accurate\"; since if not, both classifiers might be easily falsified by certain trivial examples, making the \"disagreed examples\" not as meaningful.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_soundness-correctness",
      "polarity": "pol_negative"
    },
    {
      "review_id": "HkgKMIsTFr",
      "sentence_index": 7,
      "text": "If that is true, I would suggest the authors to make this hidden assumption clearer in the paper",
      "suffix": "\n\n",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_edit",
      "aspect": "asp_clarity",
      "polarity": "none"
    },
    {
      "review_id": "HkgKMIsTFr",
      "sentence_index": 8,
      "text": "The idea shows clear liaison to the \"differential testing\" concept in software engineering besides the cited work of perceptual quality assessment.",
      "suffix": "",
      "review_action": "arg_fact",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "HkgKMIsTFr",
      "sentence_index": 9,
      "text": "The idea has a cross-disciplinary nature and is fairly interesting to me.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_motivation-impact",
      "polarity": "pol_positive"
    },
    {
      "review_id": "HkgKMIsTFr",
      "sentence_index": 10,
      "text": "I can see the paper to be of interest to a quite broad audience and can motivate many subsequent works.",
      "suffix": "\n\n",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_motivation-impact",
      "polarity": "pol_positive"
    },
    {
      "review_id": "HkgKMIsTFr",
      "sentence_index": 11,
      "text": "One minor comment: for images in \"Case III\", the authors considered them \"contribute little to performance comparison between the two classifiers\" and therefore did not source labels for them.",
      "suffix": "",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_quote",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "HkgKMIsTFr",
      "sentence_index": 12,
      "text": "However, since the authors adopted an affinity-aware distance, two incorrect predictions can still be compared based on their semantic tree distances to the true class.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_soundness-correctness",
      "polarity": "pol_negative"
    }
  ],
  "rebuttal_sentences": [
    {
      "review_id": "HkgKMIsTFr",
      "rebuttal_id": "ryePhK-giS",
      "sentence_index": 0,
      "text": "1. I feel the approach to implicitly assume that the classifiers to be compared are already \"reasonably accurate\"; since if not, both classifiers might be easily falsified by certain trivial examples, making the \"disagreed examples\" not as meaningful. If that is true, I would suggest the authors to make this hidden assumption clearer in the paper",
      "suffix": "\n\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_sentences",
        [
          6,
          7
        ]
      ],
      "details": {}
    },
    {
      "review_id": "HkgKMIsTFr",
      "rebuttal_id": "ryePhK-giS",
      "sentence_index": 1,
      "text": "Response: Thanks for the constructive suggestion. We agree with the reviewer and will make this assumption explicit in the revised manuscript.",
      "suffix": "\n\n",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_by-cr",
      "alignment": [
        "context_sentences",
        [
          6,
          7
        ]
      ],
      "details": {
        "manuscript_change": true
      }
    },
    {
      "review_id": "HkgKMIsTFr",
      "rebuttal_id": "ryePhK-giS",
      "sentence_index": 2,
      "text": "2. The idea shows clear liaison to the \"differential testing\" concept in software engineering besides the cited work of perceptual quality assessment.",
      "suffix": "",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_sentences",
        [
          8,
          9,
          10
        ]
      ],
      "details": {}
    },
    {
      "review_id": "HkgKMIsTFr",
      "rebuttal_id": "ryePhK-giS",
      "sentence_index": 3,
      "text": "The idea has a cross-disciplinary nature and is fairly interesting to me.",
      "suffix": "",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_sentences",
        [
          8,
          9,
          10
        ]
      ],
      "details": {}
    },
    {
      "review_id": "HkgKMIsTFr",
      "rebuttal_id": "ryePhK-giS",
      "sentence_index": 4,
      "text": "I can see the paper to be of interest to a quite broad audience and can motivate many subsequent works.",
      "suffix": "\n\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_sentences",
        [
          8,
          9,
          10
        ]
      ],
      "details": {}
    },
    {
      "review_id": "HkgKMIsTFr",
      "rebuttal_id": "ryePhK-giS",
      "sentence_index": 5,
      "text": "Response: Thanks for recognizing the strengths of the paper. We will add the appropriate references regarding the \"differential testing\" concept in software engineering.",
      "suffix": "\n\n",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_by-cr",
      "alignment": [
        "context_sentences",
        [
          8,
          9,
          10
        ]
      ],
      "details": {
        "manuscript_change": true
      }
    },
    {
      "review_id": "HkgKMIsTFr",
      "rebuttal_id": "ryePhK-giS",
      "sentence_index": 6,
      "text": "3. One minor comment: for images in \"Case III\", the authors considered them \"contribute little to performance comparison between the two classifiers\" and therefore did not source labels for them.",
      "suffix": "",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_sentences",
        [
          11,
          12
        ]
      ],
      "details": {}
    },
    {
      "review_id": "HkgKMIsTFr",
      "rebuttal_id": "ryePhK-giS",
      "sentence_index": 7,
      "text": "However, since the authors adopted an affinity-aware distance, two incorrect predictions can still be compared based on their semantic tree distances to the true class.",
      "suffix": "\n\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_sentences",
        [
          11,
          12
        ]
      ],
      "details": {}
    },
    {
      "review_id": "HkgKMIsTFr",
      "rebuttal_id": "ryePhK-giS",
      "sentence_index": 8,
      "text": "Response: Thanks for pointing it out.",
      "suffix": "",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_social",
      "alignment": [
        "context_sentences",
        [
          11,
          12
        ]
      ],
      "details": {}
    },
    {
      "review_id": "HkgKMIsTFr",
      "rebuttal_id": "ryePhK-giS",
      "sentence_index": 9,
      "text": "We agree with the reviewer that images falling into Case III can be used to distinguish the associated two classifiers using the proposed semantic tree distance.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_concede-criticism",
      "alignment": [
        "context_sentences",
        [
          11,
          12
        ]
      ],
      "details": {}
    },
    {
      "review_id": "HkgKMIsTFr",
      "rebuttal_id": "ryePhK-giS",
      "sentence_index": 10,
      "text": "We will revise the writing to make it more rigorous.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_by-cr",
      "alignment": [
        "context_sentences",
        [
          11,
          12
        ]
      ],
      "details": {
        "manuscript_change": true
      }
    },
    {
      "review_id": "HkgKMIsTFr",
      "rebuttal_id": "ryePhK-giS",
      "sentence_index": 11,
      "text": "In our current subjective assessment environment, we choose to stop labeling images in Case III because it is difficult for humans to select one among 200 classes, especially when they are unfamiliar with the class ontology.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          11,
          12
        ]
      ],
      "details": {}
    }
  ]
}