{
  "metadata": {
    "forum_id": "Byl8hhNYPS",
    "review_id": "B1gYdNvCFS",
    "rebuttal_id": "r1eQFhX4ir",
    "title": "Neural Machine Translation with Universal Visual Representation",
    "reviewer": "AnonReviewer2",
    "rating": 8,
    "conference": "ICLR2020",
    "permalink": "https://openreview.net/forum?id=Byl8hhNYPS&noteId=r1eQFhX4ir",
    "annotator": "anno13"
  },
  "review_sentences": [
    {
      "review_id": "B1gYdNvCFS",
      "sentence_index": 0,
      "text": "This paper provides an approach to use visual information to improve text only neural machine translation systems.",
      "suffix": "",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "B1gYdNvCFS",
      "sentence_index": 1,
      "text": "The approach creates a \"topic word to images\" map using an existing image aligned translation corpora.",
      "suffix": "",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "B1gYdNvCFS",
      "sentence_index": 2,
      "text": "Given a source sentence, the model extracts relevant images, extracts their Resnet features and fuses them with the features generated from the word sequence.",
      "suffix": "",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "B1gYdNvCFS",
      "sentence_index": 3,
      "text": "The decoder uses these fused representation to generate the target sentence.",
      "suffix": "",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "B1gYdNvCFS",
      "sentence_index": 4,
      "text": "Overall, I like the approach, seems like it can be easily augmented to existing NMT systems.",
      "suffix": "\n\n",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_originality",
      "polarity": "pol_positive"
    },
    {
      "review_id": "B1gYdNvCFS",
      "sentence_index": 5,
      "text": "One of the claims of the paper was to be able to use monolingual image aligned data.",
      "suffix": "",
      "review_action": "arg_fact",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "B1gYdNvCFS",
      "sentence_index": 6,
      "text": "However image captioning datasets are not mentioned.",
      "suffix": "",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_edit",
      "aspect": "asp_meaningful-comparison",
      "polarity": "pol_negative"
    },
    {
      "review_id": "B1gYdNvCFS",
      "sentence_index": 7,
      "text": "It would make sense to use image captioning data to create the image lookup.",
      "suffix": "",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_edit",
      "aspect": "asp_meaningful-comparison",
      "polarity": "none"
    },
    {
      "review_id": "B1gYdNvCFS",
      "sentence_index": 8,
      "text": "Also, what will be the performance of a standard image captioning system on the task ?",
      "suffix": "",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_result",
      "aspect": "asp_substance",
      "polarity": "none"
    },
    {
      "review_id": "B1gYdNvCFS",
      "sentence_index": 9,
      "text": "I believe it will not be great, but I think for completeness, you should add such a baseline.",
      "suffix": "\n\n",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_result",
      "aspect": "asp_substance",
      "polarity": "none"
    },
    {
      "review_id": "B1gYdNvCFS",
      "sentence_index": 10,
      "text": "Minor comments:",
      "suffix": "\n",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_heading",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "B1gYdNvCFS",
      "sentence_index": 11,
      "text": "1. What is M in Algorithm 1 ?",
      "suffix": "\n",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_clarification",
      "aspect": "asp_clarity",
      "polarity": "none"
    },
    {
      "review_id": "B1gYdNvCFS",
      "sentence_index": 12,
      "text": "2. First paragraph in related work is very unrelated to the current subject, please remove.",
      "suffix": "",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_edit",
      "aspect": "asp_clarity",
      "polarity": "pol_negative"
    }
  ],
  "rebuttal_sentences": [
    {
      "review_id": "B1gYdNvCFS",
      "rebuttal_id": "r1eQFhX4ir",
      "sentence_index": 0,
      "text": "Thanks for your constructive feedbacks! Please see our response below.",
      "suffix": "\n\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_social",
      "alignment": [
        "context_global",
        null
      ],
      "details": {}
    },
    {
      "review_id": "B1gYdNvCFS",
      "rebuttal_id": "r1eQFhX4ir",
      "sentence_index": 1,
      "text": "1. About image captioning.",
      "suffix": "\n\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_sentences",
        [
          5,
          6,
          7
        ]
      ],
      "details": {}
    },
    {
      "review_id": "B1gYdNvCFS",
      "rebuttal_id": "r1eQFhX4ir",
      "sentence_index": 2,
      "text": "Yes. Image captioning dataset is absolutely available for creating the lookup table.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          5,
          6,
          7
        ]
      ],
      "details": {}
    },
    {
      "review_id": "B1gYdNvCFS",
      "rebuttal_id": "r1eQFhX4ir",
      "sentence_index": 3,
      "text": "As you suggest, we use MS COCO Image captioning dataset to learn a lookup table and apply it to the EN-RO translation task to do the quick evaluation.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          5,
          6,
          7
        ]
      ],
      "details": {}
    },
    {
      "review_id": "B1gYdNvCFS",
      "rebuttal_id": "r1eQFhX4ir",
      "sentence_index": 4,
      "text": "As a result, the BLEU score is (33.55), which is comparable to the current lookup table (33.78) based on Multi30K, and outperforms the Trans. (base) (32.66).",
      "suffix": "\n\n",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          5,
          6,
          7
        ]
      ],
      "details": {}
    },
    {
      "review_id": "B1gYdNvCFS",
      "rebuttal_id": "r1eQFhX4ir",
      "sentence_index": 5,
      "text": "Regarding the performance of the standard image captioning system, we train a caption model (Show, Attend, and Tell (Xu et al., 2015b)) with fine-tuned encoder (ResNet101) on the COCO dataset to encode the images.",
      "suffix": "",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_sentences",
        [
          8,
          9
        ]
      ],
      "details": {}
    },
    {
      "review_id": "B1gYdNvCFS",
      "rebuttal_id": "r1eQFhX4ir",
      "sentence_index": 6,
      "text": "The result on EN-RO is 33.58.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          8,
          9
        ]
      ],
      "details": {}
    },
    {
      "review_id": "B1gYdNvCFS",
      "rebuttal_id": "r1eQFhX4ir",
      "sentence_index": 7,
      "text": "We are a little bit uncertain if we have well understood this request because our task is text to text translation while image captioning is image to text.",
      "suffix": "",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_followup",
      "alignment": [
        "context_sentences",
        [
          8,
          9
        ]
      ],
      "details": {}
    },
    {
      "review_id": "B1gYdNvCFS",
      "rebuttal_id": "r1eQFhX4ir",
      "sentence_index": 8,
      "text": "If not, we are glad to address further.",
      "suffix": "\n\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_followup",
      "alignment": [
        "context_sentences",
        [
          8,
          9
        ]
      ],
      "details": {}
    },
    {
      "review_id": "B1gYdNvCFS",
      "rebuttal_id": "r1eQFhX4ir",
      "sentence_index": 9,
      "text": "2. About the minor comments.",
      "suffix": "\n\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_sentences",
        [
          10
        ]
      ],
      "details": {}
    },
    {
      "review_id": "B1gYdNvCFS",
      "rebuttal_id": "r1eQFhX4ir",
      "sentence_index": 10,
      "text": "(1)\tThis is typo. It is Q.",
      "suffix": "\n\n",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_concede-criticism",
      "alignment": [
        "context_sentences",
        [
          11
        ]
      ],
      "details": {}
    },
    {
      "review_id": "B1gYdNvCFS",
      "rebuttal_id": "r1eQFhX4ir",
      "sentence_index": 11,
      "text": "(2)\tYes. We will remove it following your suggestion.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_done",
      "alignment": [
        "context_sentences",
        [
          12
        ]
      ],
      "details": {
        "request_out_of_scope": true
      }
    }
  ]
}