{
  "metadata": {
    "forum_id": "ryxLG2RcYX",
    "review_id": "SJl2_N-q2m",
    "rebuttal_id": "BklWPRFqpQ",
    "title": "Learning Abstract Models for Long-Horizon Exploration",
    "reviewer": "AnonReviewer3",
    "rating": 6,
    "conference": "ICLR2019",
    "permalink": "https://openreview.net/forum?id=ryxLG2RcYX&noteId=BklWPRFqpQ",
    "annotator": "anno10"
  },
  "review_sentences": [
    {
      "review_id": "SJl2_N-q2m",
      "sentence_index": 0,
      "text": "This paper considers reinforcement learning tasks that have high-dimensional space, long-horizon time, sparse-rewards.",
      "suffix": "",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "SJl2_N-q2m",
      "sentence_index": 1,
      "text": "In this setting, current reinforcement learning algorithms struggle to train agents so that they can achieve high rewards.",
      "suffix": "",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "SJl2_N-q2m",
      "sentence_index": 2,
      "text": "To address this problem, the authors propose an abstract MDP algorithm.",
      "suffix": "",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "SJl2_N-q2m",
      "sentence_index": 3,
      "text": "The algorithm consists of three parts: manager, worker, and discoverer.",
      "suffix": "",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "SJl2_N-q2m",
      "sentence_index": 4,
      "text": "The manager controls the exploration scheduling, the worker updates the policy, and the discoverer purely explores the abstract states.",
      "suffix": "",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "SJl2_N-q2m",
      "sentence_index": 5,
      "text": "Since there are too many state, the abstract MDP utilize the RAM state as the corresponding abstract state for each situation.",
      "suffix": "\n\n",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "SJl2_N-q2m",
      "sentence_index": 6,
      "text": "The main strong point of this paper is the experiment section.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_substance",
      "polarity": "pol_positive"
    },
    {
      "review_id": "SJl2_N-q2m",
      "sentence_index": 7,
      "text": "The proposed algorithm outperforms all previous state of the art algorithms for Montezuma\u2019s revenge, Pitfall!, and Private eye over a factor of 2.",
      "suffix": "\n\n",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_substance",
      "polarity": "pol_positive"
    },
    {
      "review_id": "SJl2_N-q2m",
      "sentence_index": 8,
      "text": "It is a minor weak point that the algorithm can work only when the abstract state is obtained by the RAM state.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_soundness-correctness",
      "polarity": "pol_negative"
    },
    {
      "review_id": "SJl2_N-q2m",
      "sentence_index": 9,
      "text": "In some RL tasks, it is not allowed to access the RAM state.",
      "suffix": "\n\n",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_soundness-correctness",
      "polarity": "pol_negative"
    },
    {
      "review_id": "SJl2_N-q2m",
      "sentence_index": 10,
      "text": "================================",
      "suffix": "\n",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_heading",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "SJl2_N-q2m",
      "sentence_index": 11,
      "text": "I've read all other reviewers' comments and the response from authors, and decreased the score.",
      "suffix": "",
      "review_action": "arg_other",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "SJl2_N-q2m",
      "sentence_index": 12,
      "text": "Although this paper contains interesting idea and results, as other reviewers pointed out, it is very hard to compare with other algorithm.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_soundness-correctness",
      "polarity": "pol_negative"
    },
    {
      "review_id": "SJl2_N-q2m",
      "sentence_index": 13,
      "text": "I agree to other reviewers.",
      "suffix": "",
      "review_action": "arg_other",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "SJl2_N-q2m",
      "sentence_index": 14,
      "text": "The algorithm assumptions are strong.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_soundness-correctness",
      "polarity": "pol_negative"
    }
  ],
  "rebuttal_sentences": [
    {
      "review_id": "SJl2_N-q2m",
      "rebuttal_id": "BklWPRFqpQ",
      "sentence_index": 0,
      "text": "We thank Reviewer 3 for their comments.",
      "suffix": "",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_social",
      "alignment": [
        "context_global",
        null
      ],
      "details": {}
    },
    {
      "review_id": "SJl2_N-q2m",
      "rebuttal_id": "BklWPRFqpQ",
      "sentence_index": 1,
      "text": "Reviewer 3 points out the strong state-of-the-art performance of our approach as a strength and mentions prior knowledge (our use of RAM state information) as a minor weakness.",
      "suffix": "",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_sentences",
        [
          8,
          9
        ]
      ],
      "details": {}
    },
    {
      "review_id": "SJl2_N-q2m",
      "rebuttal_id": "BklWPRFqpQ",
      "sentence_index": 2,
      "text": "To clarify, in our experiments, we outperform previous non-demonstration state-of-the-art approaches that use a comparable amount of prior knowledge.",
      "suffix": "",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_reject-criticism",
      "alignment": [
        "context_sentences",
        [
          8,
          9
        ]
      ],
      "details": {}
    },
    {
      "review_id": "SJl2_N-q2m",
      "rebuttal_id": "BklWPRFqpQ",
      "sentence_index": 3,
      "text": "We discuss our usage of prior knowledge in greater detail in the section titled \u201cPrior Knowledge\u201d in our response to Reviewer 2.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_done",
      "alignment": [
        "context_sentences",
        [
          8,
          9
        ]
      ],
      "details": {
        "request_out_of_scope": true
      }
    }
  ]
}