{
  "metadata": {
    "forum_id": "HklAhi09Y7",
    "review_id": "HJlWUOjV37",
    "rebuttal_id": "H1x8HiTh1E",
    "title": "Question Generation using a Scratchpad Encoder",
    "reviewer": "AnonReviewer1",
    "rating": 4,
    "conference": "ICLR2019",
    "permalink": "https://openreview.net/forum?id=HklAhi09Y7&noteId=H1x8HiTh1E",
    "annotator": "anno3"
  },
  "review_sentences": [
    {
      "review_id": "HJlWUOjV37",
      "sentence_index": 0,
      "text": "The paper studies the problem of question generation from sparql queries.",
      "suffix": "",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "HJlWUOjV37",
      "sentence_index": 1,
      "text": "The motivation is to generate more training data for knowledge base question answering systems to be trained on.",
      "suffix": "",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "HJlWUOjV37",
      "sentence_index": 2,
      "text": "However, this task is an instance of natural language generation: given a meaning representation (quite often a database record), generate the natural language text correspoding to it. And previous work on this topic has proposed very similar ideas to the scratchpad proposed here in order to keep track of what the neural decoder has already generated, here are two of them:",
      "suffix": "\n",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_originality",
      "polarity": "pol_negative"
    },
    {
      "review_id": "HJlWUOjV37",
      "sentence_index": 3,
      "text": "- Semantically Conditioned LSTM-based Natural Language Generation for Spoken Dialogue Systems",
      "suffix": "\n",
      "review_action": "none",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "HJlWUOjV37",
      "sentence_index": 4,
      "text": "Tsung-Hsien Wen, Milica Gasic, Nikola Mrksic, Pei-Hao Su, David Vandyke, Steve Young, EMNLP 2015: https://arxiv.org/abs/1508.01745",
      "suffix": "\n",
      "review_action": "none",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "HJlWUOjV37",
      "sentence_index": 5,
      "text": "- Globally Coherent Text Generation with Neural Checklist Models",
      "suffix": "\n",
      "review_action": "none",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "HJlWUOjV37",
      "sentence_index": 6,
      "text": "Chloe Kiddon Luke Zettlemoyer Yejin Choi: https://aclweb.org/anthology/D16-1032",
      "suffix": "\n",
      "review_action": "none",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "HJlWUOjV37",
      "sentence_index": 7,
      "text": "Thus the main novelty claim of the paper needs to be hedged appropriately.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_originality",
      "polarity": "pol_negative"
    },
    {
      "review_id": "HJlWUOjV37",
      "sentence_index": 8,
      "text": "Also, to demonstrate the superiority of the proposed method an appropriate comparison against previous work is needed.",
      "suffix": "\n\n",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_meaningful-comparison",
      "polarity": "pol_negative"
    },
    {
      "review_id": "HJlWUOjV37",
      "sentence_index": 9,
      "text": "Some other points:",
      "suffix": "\n",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_heading",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "HJlWUOjV37",
      "sentence_index": 10,
      "text": "- How is the linearization of the inout done? It  typically matters",
      "suffix": "\n",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_explanation",
      "aspect": "asp_substance",
      "polarity": "none"
    },
    {
      "review_id": "HJlWUOjV37",
      "sentence_index": 11,
      "text": "- Given the small size of the dataset, I would propose experimenting with non-neural approaches as well, which are also quite common in NLG.",
      "suffix": "\n",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_experiment",
      "aspect": "asp_replicability",
      "polarity": "pol_negative"
    },
    {
      "review_id": "HJlWUOjV37",
      "sentence_index": 12,
      "text": "- On the human evaluation: showing the gold standard reference to the judges introduces bias to the evaluation which is inappropriate as in language generation tasks there are multiple correct answers.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_soundness-correctness",
      "polarity": "pol_negative"
    },
    {
      "review_id": "HJlWUOjV37",
      "sentence_index": 13,
      "text": "See this paper for discussion in the context of machine translation: http://www.aclweb.org/anthology/P16-2013",
      "suffix": "\n",
      "review_action": "none",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "HJlWUOjV37",
      "sentence_index": 14,
      "text": "- For the automatic evaluation measures there should be multiple references per SPARQL query since this is how BLEU et al are supposed to be used.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_substance",
      "polarity": "pol_negative"
    },
    {
      "review_id": "HJlWUOjV37",
      "sentence_index": 15,
      "text": "Also, this would allow to compare the references against each other (filling in the missing number in Table 4) and this would allow an evaluation of the evaluation itself: while perfect scores are unlikely, the human references should be much better than the systems.",
      "suffix": "\n",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_meaningful-comparison",
      "polarity": "pol_negative"
    },
    {
      "review_id": "HJlWUOjV37",
      "sentence_index": 16,
      "text": "- In the outputs shown in Table 3, the questions generated by the scratchpad encoder often seem to be too general compared to the gold standard, or incorrect.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_soundness-correctness",
      "polarity": "pol_negative"
    },
    {
      "review_id": "HJlWUOjV37",
      "sentence_index": 17,
      "text": "E.g. \"what job did jefferson have\" is semntically related to his role in the declaration of independence but rather different.",
      "suffix": "",
      "review_action": "none",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "HJlWUOjV37",
      "sentence_index": 18,
      "text": "SImilarly, being married to someone is not the same as having a baby with someone.",
      "suffix": "",
      "review_action": "none",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "HJlWUOjV37",
      "sentence_index": 19,
      "text": "While I could imagine human judges preferring them as they are fluent, I think they are wrong as they express a different meaning than the SPARQL query they are supposed to express.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_clarity",
      "polarity": "pol_negative"
    },
    {
      "review_id": "HJlWUOjV37",
      "sentence_index": 20,
      "text": "What were the guidelines used?",
      "suffix": "",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_explanation",
      "aspect": "asp_clarity",
      "polarity": "none"
    }
  ],
  "rebuttal_sentences": [
    {
      "review_id": "HJlWUOjV37",
      "rebuttal_id": "H1x8HiTh1E",
      "sentence_index": 0,
      "text": "We are aware of the related work you mention.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_concede-criticism",
      "alignment": [
        "context_sentences",
        [
          2,
          3,
          4,
          5,
          6,
          7
        ]
      ],
      "details": {}
    },
    {
      "review_id": "HJlWUOjV37",
      "rebuttal_id": "H1x8HiTh1E",
      "sentence_index": 1,
      "text": "Please note that unfortunately the \u201cSemantically Conditioned LSTM\u2026\u201d is not directly comparable because, as they state in their paper, \u201cthe generator is further conditioned on a control vector d, a 1-hot representation of the dialogue act (DA) type and its slot-value pairs\u201d.",
      "suffix": "",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_reject-criticism",
      "alignment": [
        "context_sentences",
        [
          3
        ]
      ],
      "details": {}
    },
    {
      "review_id": "HJlWUOjV37",
      "rebuttal_id": "H1x8HiTh1E",
      "sentence_index": 2,
      "text": "Our goal is to work with arbitrarily complex questions that map to correspondingly arbitrarily complex logical forms and not a very restricted set of logical forms that could be represented in a one-hot fashion.",
      "suffix": "\n",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_reject-criticism",
      "alignment": [
        "context_sentences",
        [
          3
        ]
      ],
      "details": {}
    },
    {
      "review_id": "HJlWUOjV37",
      "rebuttal_id": "H1x8HiTh1E",
      "sentence_index": 3,
      "text": "Please do note that we ran 2 sets of human evaluations (Adequacy and Fluency), as is standard in Machine translation in order to deal with the evaluation bias problem you describe - we took this into account when conducting experiments and will make it more clear in a revised version.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_by-cr",
      "alignment": [
        "context_sentences",
        [
          14,
          15
        ]
      ],
      "details": {
        "manuscript_change": true
      }
    },
    {
      "review_id": "HJlWUOjV37",
      "rebuttal_id": "H1x8HiTh1E",
      "sentence_index": 4,
      "text": "We also observe significant improvements in both human evaluations, suggesting that the improvement comes from our method and not from evaluation bias.",
      "suffix": "\n",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          14,
          15
        ]
      ],
      "details": {}
    },
    {
      "review_id": "HJlWUOjV37",
      "rebuttal_id": "H1x8HiTh1E",
      "sentence_index": 5,
      "text": "Our dataset only contains a single logical form for each question and vice-versa, making it impossible to evaluate quantitative metrics (bleu, rouge, meteor) in the multi-reference setting you describe.",
      "suffix": "",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_reject-criticism",
      "alignment": [
        "context_sentences",
        [
          14,
          15
        ]
      ],
      "details": {}
    },
    {
      "review_id": "HJlWUOjV37",
      "rebuttal_id": "H1x8HiTh1E",
      "sentence_index": 6,
      "text": "Please also note that metrics like bleu and rouge have been commonly used in a non multi-reference setting by significant work in the natural language processing community.",
      "suffix": "\n",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_reject-criticism",
      "alignment": [
        "context_sentences",
        [
          14,
          15
        ]
      ],
      "details": {}
    },
    {
      "review_id": "HJlWUOjV37",
      "rebuttal_id": "H1x8HiTh1E",
      "sentence_index": 7,
      "text": "We thank the reviewer for their comments and will take them into account in a revised version.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_by-cr",
      "alignment": [
        "context_global",
        null
      ],
      "details": {
        "manuscript_change": true
      }
    }
  ]
}