{
  "metadata": {
    "forum_id": "BkeDEoCctQ",
    "review_id": "BklZ3SQ5nX",
    "rebuttal_id": "ByxoQiKZyN",
    "title": "Deep Curiosity Search: Intra-Life Exploration Can Improve Performance on Challenging Deep Reinforcement Learning Problems",
    "reviewer": "AnonReviewer1",
    "rating": 5,
    "conference": "ICLR2019",
    "permalink": "https://openreview.net/forum?id=BkeDEoCctQ&noteId=ByxoQiKZyN",
    "annotator": "anno10"
  },
  "review_sentences": [
    {
      "review_id": "BklZ3SQ5nX",
      "sentence_index": 0,
      "text": "Summary:",
      "suffix": "\n",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_heading",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "BklZ3SQ5nX",
      "sentence_index": 1,
      "text": "The authors look at the problem of exploration in deep RL.",
      "suffix": "",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "BklZ3SQ5nX",
      "sentence_index": 2,
      "text": "They propose a \u201ccuriosity grid\u201d which is a virtual grid laid out on top of the current level/area that an Atari agent is in.",
      "suffix": "",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "BklZ3SQ5nX",
      "sentence_index": 3,
      "text": "Once an agent enters a new cell of the grid, it obtains a small reward, encouraging the agent to explore all parts of the game.",
      "suffix": "",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "BklZ3SQ5nX",
      "sentence_index": 4,
      "text": "The grid is reset (meaning new rewards can be obtained) after every roll out (meaning the Atari agent has used up all its lives and the game restarts).",
      "suffix": "\n\n",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "BklZ3SQ5nX",
      "sentence_index": 5,
      "text": "The authors argue that this method enables better exploration and they obtain an impressive score on Montezuma\u2019s Revenge (MR).",
      "suffix": "\n\n",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "BklZ3SQ5nX",
      "sentence_index": 6,
      "text": "Review:",
      "suffix": "\n",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_heading",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "BklZ3SQ5nX",
      "sentence_index": 7,
      "text": "The paper contains an extensive introduction with many references to prior work, and a sensible lead up to the introduced algorithm.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_motivation-impact",
      "polarity": "pol_positive"
    },
    {
      "review_id": "BklZ3SQ5nX",
      "sentence_index": 8,
      "text": "The algorithm itself seems to work well and some of the results are convincing.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_soundness-correctness",
      "polarity": "pol_positive"
    },
    {
      "review_id": "BklZ3SQ5nX",
      "sentence_index": 9,
      "text": "I am a bit worried about the fact that the agents have access to their history of locations (\u201cthe grid\u201d).",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_soundness-correctness",
      "polarity": "pol_negative"
    },
    {
      "review_id": "BklZ3SQ5nX",
      "sentence_index": 10,
      "text": "The authors mention that none of the methods they compare against has this advantage and it seems that in a game that rewards exploration directly (MR) this is a large advantage.",
      "suffix": "\n\n",
      "review_action": "arg_fact",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "BklZ3SQ5nX",
      "sentence_index": 11,
      "text": "The authors comment on this advantage in section 3 and found that removing intrinsic rewards hurt performance significantly.",
      "suffix": "",
      "review_action": "arg_fact",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "BklZ3SQ5nX",
      "sentence_index": 12,
      "text": "Only removing the grid access made results on MR very unstable.",
      "suffix": "",
      "review_action": "arg_fact",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "BklZ3SQ5nX",
      "sentence_index": 13,
      "text": "However in order to compute the intrinsic rewards, it still seems necessary to access the location of the agent, meaning that implicitly the advantage of the method is still there.",
      "suffix": "\n\n",
      "review_action": "arg_fact",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "BklZ3SQ5nX",
      "sentence_index": 14,
      "text": "I was wondering if the authors find that the agents are forcibly exploring the entire environment during each rollout? Even if the agent knows what/where the actual goal is. There is a hint to this behaviour in section 4, on exploration in sparse domains.",
      "suffix": "\n\n",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_explanation",
      "aspect": "asp_soundness-correctness",
      "polarity": "none"
    },
    {
      "review_id": "BklZ3SQ5nX",
      "sentence_index": 15,
      "text": "The future work section mentions some interesting improvements, where the agent position is learned from data. That seems like a promising direction that would generalise beyond Atari games and avoids the advantage.",
      "suffix": "\n\n",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_substance",
      "polarity": "pol_positive"
    },
    {
      "review_id": "BklZ3SQ5nX",
      "sentence_index": 16,
      "text": "Nits/writing feedback:",
      "suffix": "\n",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_heading",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "BklZ3SQ5nX",
      "sentence_index": 17,
      "text": "- There is no need for such repetitive citing (esp paragraph 2 on page 2).",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_clarity",
      "polarity": "pol_negative"
    },
    {
      "review_id": "BklZ3SQ5nX",
      "sentence_index": 18,
      "text": "Sometimes the same paper is cited 4 times within a few lines.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_clarity",
      "polarity": "pol_negative"
    },
    {
      "review_id": "BklZ3SQ5nX",
      "sentence_index": 19,
      "text": "While it\u2019s great that so much prior work was acknowledged, mentioning a paper once per paragraph is (usually) sufficient and increases readability.",
      "suffix": "\n",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_clarity",
      "polarity": "pol_negative"
    },
    {
      "review_id": "BklZ3SQ5nX",
      "sentence_index": 20,
      "text": "- I think the comparison between prior lifetimes and humans mastering a language doesn\u2019t hold up and is distracting",
      "suffix": "\n\n",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_soundness-correctness",
      "polarity": "pol_negative"
    },
    {
      "review_id": "BklZ3SQ5nX",
      "sentence_index": 21,
      "text": "##",
      "suffix": "",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_heading",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "BklZ3SQ5nX",
      "sentence_index": 22,
      "text": "##",
      "suffix": "\n",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_heading",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "BklZ3SQ5nX",
      "sentence_index": 23,
      "text": "Revision:",
      "suffix": "\n\n",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_heading",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "BklZ3SQ5nX",
      "sentence_index": 24,
      "text": "The rebuttal does little to clarify open questions:",
      "suffix": "\n",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "arg_other",
      "polarity": "pol_negative"
    },
    {
      "review_id": "BklZ3SQ5nX",
      "sentence_index": 25,
      "text": "1. Both reviewer 2 and I commented on the ablation study regarding the grid but received no reply.",
      "suffix": "\n",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_substance",
      "polarity": "pol_negative"
    },
    {
      "review_id": "BklZ3SQ5nX",
      "sentence_index": 26,
      "text": "2. I am not convinced this method is sufficiently new, given that there are other methods that try to directly reward visiting new states.",
      "suffix": "\n",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_originality",
      "polarity": "pol_negative"
    },
    {
      "review_id": "BklZ3SQ5nX",
      "sentence_index": 27,
      "text": "3. The authors argue in their rebuttal that \"the grid\" is a novel idea that warrants investigation, but remark in figure 5 that likely it isn't the key aspect of their algorithm.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_originality",
      "polarity": "pol_negative"
    },
    {
      "review_id": "BklZ3SQ5nX",
      "sentence_index": 28,
      "text": "This seems contradictory.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_originality",
      "polarity": "pol_negative"
    }
  ],
  "rebuttal_sentences": [
    {
      "review_id": "BklZ3SQ5nX",
      "rebuttal_id": "ByxoQiKZyN",
      "sentence_index": 0,
      "text": "Due to the overlap between reviewer comments, we decided to address all concerns in a single response (please see above).",
      "suffix": "",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_other",
      "alignment": [
        "context_none",
        null
      ],
      "details": {}
    }
  ]
}