{
  "metadata": {
    "forum_id": "Sklgs0NFvr",
    "review_id": "B1eNnTeLqS",
    "rebuttal_id": "rke_9oPqir",
    "title": "Learning The Difference That Makes A Difference With Counterfactually-Augmented Data",
    "reviewer": "AnonReviewer4",
    "rating": 6,
    "conference": "ICLR2020",
    "permalink": "https://openreview.net/forum?id=Sklgs0NFvr&noteId=rke_9oPqir",
    "annotator": "anno8"
  },
  "review_sentences": [
    {
      "review_id": "B1eNnTeLqS",
      "sentence_index": 0,
      "text": "This paper addresses the problem of building models for NLP tasks that are robust against spurious correlations in the data by introducing a human-in-the-loop method: annotators are asked to modify data-points minimally in order to change the label.",
      "suffix": "",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "B1eNnTeLqS",
      "sentence_index": 1,
      "text": "They refer to this process as counterfactual augmentation.",
      "suffix": "",
      "review_action": "arg_fact",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "B1eNnTeLqS",
      "sentence_index": 2,
      "text": "The authors apply this method to the IMDB sentiment dataset and to SNLI and show (among other things) that many models cannot generalize from the original dataset to the counterfactually-augmented one.",
      "suffix": "\n\n",
      "review_action": "arg_fact",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "B1eNnTeLqS",
      "sentence_index": 3,
      "text": "This contribution is timely and addresses a very important problem that needs to be addressed in order to build more robust NLP systems.",
      "suffix": "\n\n",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_motivation-impact",
      "polarity": "pol_positive"
    },
    {
      "review_id": "B1eNnTeLqS",
      "sentence_index": 4,
      "text": "Because, however, of a few limitations, I recommend weak acceptance.",
      "suffix": "\n\n",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_heading",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "B1eNnTeLqS",
      "sentence_index": 5,
      "text": "My main hesitation comes from a lack of clarity about the main lesson we have learned.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_clarity",
      "polarity": "pol_negative"
    },
    {
      "review_id": "B1eNnTeLqS",
      "sentence_index": 6,
      "text": "In particular, if the goal is to use this method to augment the data we use to train NLP systems in order to make them more robust, it seems that the time cost of the process will be prohibitive.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_clarity",
      "polarity": "pol_negative"
    },
    {
      "review_id": "B1eNnTeLqS",
      "sentence_index": 7,
      "text": "On the other hand, perhaps these methods could be used to identify the kind of spurious correlations that models tend to rely on, which could then be used in a more automated data augmentation process.",
      "suffix": "",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_experiment",
      "aspect": "asp_clarity",
      "polarity": "pol_positive"
    },
    {
      "review_id": "B1eNnTeLqS",
      "sentence_index": 8,
      "text": "If that's the goal, however, a more detailed error analysis would need to be included.",
      "suffix": "\n\n",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_explanation",
      "aspect": "asp_substance",
      "polarity": "pol_negative"
    },
    {
      "review_id": "B1eNnTeLqS",
      "sentence_index": 9,
      "text": "A few small comments:",
      "suffix": "\n\n",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_heading",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "B1eNnTeLqS",
      "sentence_index": 10,
      "text": "* There was some analysis of the augmented IMDB dataset, but none of the SNLI dataset.",
      "suffix": "",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_quote",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "B1eNnTeLqS",
      "sentence_index": 11,
      "text": "I would love to see a more detailed investigation of what annotators usually did.",
      "suffix": "",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_clarification",
      "aspect": "asp_clarity",
      "polarity": "pol_positive"
    },
    {
      "review_id": "B1eNnTeLqS",
      "sentence_index": 12,
      "text": "For instance, a reason that hypothesis-only models do well is that certain words are very predictive of certain labels (e.g. \"not\" and contradiction).",
      "suffix": "",
      "review_action": "arg_fact",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "B1eNnTeLqS",
      "sentence_index": 13,
      "text": "Do people leave the negations in when modifying such examples for entailment or neutrality, thus breaking the simple correspondence?",
      "suffix": "",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_explanation",
      "aspect": "asp_soundness-correctness",
      "polarity": "pol_negative"
    },
    {
      "review_id": "B1eNnTeLqS",
      "sentence_index": 14,
      "text": "That's a very simple kind of question; more generally, I'd like to see more analysis of the new dataset.",
      "suffix": "\n\n",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_experiment",
      "aspect": "asp_substance",
      "polarity": "pol_positive"
    },
    {
      "review_id": "B1eNnTeLqS",
      "sentence_index": 15,
      "text": "* The BiLSTM they use is very small (embedding and hidden dimension 50).",
      "suffix": "",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_quote",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "B1eNnTeLqS",
      "sentence_index": 16,
      "text": "Given that BERT is most robust against their manipulation, it would be good to see a more powerful recurrent model for comparison.",
      "suffix": "",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_experiment",
      "aspect": "asp_meaningful-comparison",
      "polarity": "pol_negative"
    },
    {
      "review_id": "B1eNnTeLqS",
      "sentence_index": 17,
      "text": "It would be easy to use ELMo here, if the main question is about Transformers vs recurrent models.",
      "suffix": "\n\n\n",
      "review_action": "none",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "B1eNnTeLqS",
      "sentence_index": 18,
      "text": "Some very minor / typographic comments:",
      "suffix": "\n\n",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "B1eNnTeLqS",
      "sentence_index": 19,
      "text": "* abstract: \"with revise\" should be \"with revising\"",
      "suffix": "\n",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_edit",
      "aspect": "asp_clarity",
      "polarity": "pol_positive"
    },
    {
      "review_id": "B1eNnTeLqS",
      "sentence_index": 20,
      "text": "* first paragraph page 2: some references to causality literature and definition of spuriousness as common cause",
      "suffix": "\n",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_edit",
      "aspect": "asp_substance",
      "polarity": "pol_positive"
    },
    {
      "review_id": "B1eNnTeLqS",
      "sentence_index": 21,
      "text": "* page 2, \"We show that...\" I'd break this into two sentences to make it easier to parse.",
      "suffix": "\n",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_edit",
      "aspect": "asp_clarity",
      "polarity": "pol_positive"
    },
    {
      "review_id": "B1eNnTeLqS",
      "sentence_index": 22,
      "text": "* Table 3: I would make two columns for each model with accuracy on original versus revised.",
      "suffix": "",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_edit",
      "aspect": "asp_clarity",
      "polarity": "pol_positive"
    },
    {
      "review_id": "B1eNnTeLqS",
      "sentence_index": 23,
      "text": "With the current table, one has to compare cells in the top half of the table to those in the bottom half of the table, which is quite difficult to do.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_clarity",
      "polarity": "pol_negative"
    }
  ],
  "rebuttal_sentences": [
    {
      "review_id": "B1eNnTeLqS",
      "rebuttal_id": "rke_9oPqir",
      "sentence_index": 0,
      "text": "Thanks for the detailed and thoughtful review.",
      "suffix": "",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_social",
      "alignment": [
        "context_global",
        null
      ],
      "details": {}
    },
    {
      "review_id": "B1eNnTeLqS",
      "rebuttal_id": "rke_9oPqir",
      "sentence_index": 1,
      "text": "We are glad that you think of this paper as a timely contribution addressing an important problem that must be addressed in order to build more robust NLP systems.",
      "suffix": "\n\n",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_accept-praise",
      "alignment": [
        "context_global",
        null
      ],
      "details": {}
    },
    {
      "review_id": "B1eNnTeLqS",
      "rebuttal_id": "rke_9oPqir",
      "sentence_index": 2,
      "text": "We agree with your point that it would be great to have a practical takeaway guiding practitioners for what to do in practice.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_global",
        null
      ],
      "details": {}
    },
    {
      "review_id": "B1eNnTeLqS",
      "rebuttal_id": "rke_9oPqir",
      "sentence_index": 3,
      "text": "We believe that the first step here is to characterize the problem coherently and that having laid this groundwork, one immediate next step is, as you suggest, to develop a more practical solution that requires a less expensive/onerous annotation effort.",
      "suffix": "\n\n",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_global",
        null
      ],
      "details": {}
    },
    {
      "review_id": "B1eNnTeLqS",
      "rebuttal_id": "rke_9oPqir",
      "sentence_index": 4,
      "text": "The key contribution of our paper is to provide a clear characterization of a variety of concerns in the language of interventions and to demonstrate that indeed, they can be addressed by acquiring interventional data.",
      "suffix": "",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_summary",
      "alignment": [
        "context_global",
        null
      ],
      "details": {}
    },
    {
      "review_id": "B1eNnTeLqS",
      "rebuttal_id": "rke_9oPqir",
      "sentence_index": 5,
      "text": "The knowledge that (i) NLP models trained on counterfactually augmented data suffer less from these problems and (ii) transport better out of sample (see new results in the updated draft, per R3\u2019s suggestion) validates this.",
      "suffix": "\n\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_summary",
      "alignment": [
        "context_global",
        null
      ],
      "details": {}
    },
    {
      "review_id": "B1eNnTeLqS",
      "rebuttal_id": "rke_9oPqir",
      "sentence_index": 6,
      "text": "As you mentioned, our solution requires significant expenditure (both financial and human capital)",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_concede-criticism",
      "alignment": [
        "context_sentences",
        [
          6
        ]
      ],
      "details": {}
    },
    {
      "review_id": "B1eNnTeLqS",
      "rebuttal_id": "rke_9oPqir",
      "sentence_index": 7,
      "text": "compared to simply labeling data",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_concede-criticism",
      "alignment": [
        "context_sentences",
        [
          6
        ]
      ],
      "details": {}
    },
    {
      "review_id": "B1eNnTeLqS",
      "rebuttal_id": "rke_9oPqir",
      "sentence_index": 8,
      "text": ".",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_concede-criticism",
      "alignment": [
        "context_sentences",
        [
          6
        ]
      ],
      "details": {}
    },
    {
      "review_id": "B1eNnTeLqS",
      "rebuttal_id": "rke_9oPqir",
      "sentence_index": 9,
      "text": "As a follow-up, for existing datasets, our next steps include investigating how to make these adjustments in a cost-effective way.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          6
        ]
      ],
      "details": {}
    },
    {
      "review_id": "B1eNnTeLqS",
      "rebuttal_id": "rke_9oPqir",
      "sentence_index": 10,
      "text": "In preliminary work, we have been investigating how to use humans in the loop more effectively.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          6
        ]
      ],
      "details": {}
    },
    {
      "review_id": "B1eNnTeLqS",
      "rebuttal_id": "rke_9oPqir",
      "sentence_index": 11,
      "text": "One approach involves using generative models to propose candidate substitutions and relying on humans only accept or reject the revisions (vs having to write them from scratch).",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          6
        ]
      ],
      "details": {}
    },
    {
      "review_id": "B1eNnTeLqS",
      "rebuttal_id": "rke_9oPqir",
      "sentence_index": 12,
      "text": "Our experience with crowdsourcing suggests that this feedback would be significantly cheaper to collect (provided that a reasonable fraction of suggestions were appropriate).",
      "suffix": "\n\n",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          6
        ]
      ],
      "details": {}
    },
    {
      "review_id": "B1eNnTeLqS",
      "rebuttal_id": "rke_9oPqir",
      "sentence_index": 13,
      "text": "We additionally note that for some tasks, such as NLI, creating new datasets already requires annotators to synthesize examples de novo and the fractional increase for soliciting counterfactually-augmented data might not be as onerous as compared to tasks where the default is to rely on annotators only for tags.",
      "suffix": "\n\n",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          6
        ]
      ],
      "details": {}
    },
    {
      "review_id": "B1eNnTeLqS",
      "rebuttal_id": "rke_9oPqir",
      "sentence_index": 14,
      "text": "We are also appreciative of your constructive suggestions to improve the paper, and have taken several steps to improve the draft.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_done",
      "alignment": [
        "context_global",
        null
      ],
      "details": {
        "request_out_of_scope": true
      }
    },
    {
      "review_id": "B1eNnTeLqS",
      "rebuttal_id": "rke_9oPqir",
      "sentence_index": 15,
      "text": "These include updating the draft to include (i) a detailed analysis of edits performed on SNLI, (ii) results on various datasets using an ELMo based classifier; (iii) concerning your question about larger Bi-LSTMs, we had tried a large Bi-LSTM but it overfit badly.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_done",
      "alignment": [
        "context_sentences",
        [
          10,
          11,
          12,
          13,
          14,
          15,
          16,
          17
        ]
      ],
      "details": {
        "request_out_of_scope": true
      }
    },
    {
      "review_id": "B1eNnTeLqS",
      "rebuttal_id": "rke_9oPqir",
      "sentence_index": 16,
      "text": "We have updated the draft to include this detail.",
      "suffix": "\n\n",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_done",
      "alignment": [
        "context_sentences",
        [
          10,
          11,
          12,
          13,
          14,
          15,
          16,
          17
        ]
      ],
      "details": {
        "request_out_of_scope": true
      }
    },
    {
      "review_id": "B1eNnTeLqS",
      "rebuttal_id": "rke_9oPqir",
      "sentence_index": 17,
      "text": "Thanks also for catching several typographic errors. We have addressed them in the new draft.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_done",
      "alignment": [
        "context_sentences",
        [
          19,
          20,
          21,
          22,
          23
        ]
      ],
      "details": {
        "request_out_of_scope": true
      }
    }
  ]
}