{
  "metadata": {
    "forum_id": "BkeDEoCctQ",
    "review_id": "BJlxMu4a37",
    "rebuttal_id": "SyeVGit-kE",
    "title": "Deep Curiosity Search: Intra-Life Exploration Can Improve Performance on Challenging Deep Reinforcement Learning Problems",
    "reviewer": "AnonReviewer2",
    "rating": 5,
    "conference": "ICLR2019",
    "permalink": "https://openreview.net/forum?id=BkeDEoCctQ&noteId=SyeVGit-kE",
    "annotator": "anno16"
  },
  "review_sentences": [
    {
      "review_id": "BJlxMu4a37",
      "sentence_index": 0,
      "text": "This paper proposes use of intra-life coverage (an agent must visit all locations within each episode) for effective exploration in Atari games.",
      "suffix": "",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "BJlxMu4a37",
      "sentence_index": 1,
      "text": "This is in contrast of approaches that use inter-life coverage or curiosity metrics to incentivize exploration.",
      "suffix": "",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "BJlxMu4a37",
      "sentence_index": 2,
      "text": "The paper shows detailed results and analysis on 2 Atari games: Montezuma\u2019s Revenge and Seaquest, and reports results on other games as well.",
      "suffix": "\n\n",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "BJlxMu4a37",
      "sentence_index": 3,
      "text": "Strengths",
      "suffix": "\n",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_heading",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "BJlxMu4a37",
      "sentence_index": 4,
      "text": "1. Intuitively, the idea of intra-life curiosity is reasonable.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_originality",
      "polarity": "pol_positive"
    },
    {
      "review_id": "BJlxMu4a37",
      "sentence_index": 5,
      "text": "The paper pursues this idea and provides experimental evidence towards it on 2 Atari games.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_originality",
      "polarity": "pol_positive"
    },
    {
      "review_id": "BJlxMu4a37",
      "sentence_index": 6,
      "text": "It is able to show compelling improvements on the challenging Montezuma\u2019s Revenge game.",
      "suffix": "\n\n",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_originality",
      "polarity": "pol_positive"
    },
    {
      "review_id": "BJlxMu4a37",
      "sentence_index": 7,
      "text": "Weaknesses",
      "suffix": "\n",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_heading",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "BJlxMu4a37",
      "sentence_index": 8,
      "text": "1. The two primary comparison points are missing:",
      "suffix": "\n",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_meaningful-comparison",
      "polarity": "pol_negative"
    },
    {
      "review_id": "BJlxMu4a37",
      "sentence_index": 9,
      "text": "1a. Comparison to other exploration methods.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_meaningful-comparison",
      "polarity": "pol_negative"
    },
    {
      "review_id": "BJlxMu4a37",
      "sentence_index": 10,
      "text": "A number of methods that use state visitation counts (also referred to as diversity, eg. [A,B]), or prediction error (also referred to as curiosity, eg [C]) have been proposed in recent years.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_meaningful-comparison",
      "polarity": "pol_negative"
    },
    {
      "review_id": "BJlxMu4a37",
      "sentence_index": 11,
      "text": "It is important to place the contributions in this paper in context of these other works.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_meaningful-comparison",
      "polarity": "pol_negative"
    },
    {
      "review_id": "BJlxMu4a37",
      "sentence_index": 12,
      "text": "A number of these references are missing and no experimental comparison to these methods has been made.",
      "suffix": "\n\n",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_meaningful-comparison",
      "polarity": "pol_negative"
    },
    {
      "review_id": "BJlxMu4a37",
      "sentence_index": 13,
      "text": "1b. Comparison between inter and intra life curiosity.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_meaningful-comparison",
      "polarity": "pol_negative"
    },
    {
      "review_id": "BJlxMu4a37",
      "sentence_index": 14,
      "text": "One of the central motivation is the utility of intra-life curiosity vs inter-life curiosity, yet no comparisons to this effect have been provided.",
      "suffix": "\n\n",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_meaningful-comparison",
      "polarity": "pol_negative"
    },
    {
      "review_id": "BJlxMu4a37",
      "sentence_index": 15,
      "text": "2.",
      "suffix": "",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_heading",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "BJlxMu4a37",
      "sentence_index": 16,
      "text": "Additionally, the paper employs a custom way of computing coverage (or diversity).",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_substance",
      "polarity": "none"
    },
    {
      "review_id": "BJlxMu4a37",
      "sentence_index": 17,
      "text": "It is in terms of location of agent on the screen, as opposed to featurization of the full game screen as used in prior works.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_substance",
      "polarity": "none"
    },
    {
      "review_id": "BJlxMu4a37",
      "sentence_index": 18,
      "text": "It is possible that a large part of the gain comes from the clever design of the space for computing intrinsic exploration reward.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_substance",
      "polarity": "none"
    },
    {
      "review_id": "BJlxMu4a37",
      "sentence_index": 19,
      "text": "The paper tries to control for it, however that description is rather short and vague (not clear how the proposed reward is computed without there being a grid, or how is the grid useful without the intrinsic reward).",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_clarity",
      "polarity": "pol_negative"
    },
    {
      "review_id": "BJlxMu4a37",
      "sentence_index": 20,
      "text": "More details should be provided, and when comparisons to past works or inter-life curiosity are made, this should be controlled for.",
      "suffix": "",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_experiment",
      "aspect": "asp_meaningful-comparison",
      "polarity": "pol_negative"
    },
    {
      "review_id": "BJlxMu4a37",
      "sentence_index": 21,
      "text": "The two ideas (use of grids, and intra-life curiosity vs inter-life curiosity) should be independently investigated and put in context of past work.",
      "suffix": "\n\n",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_experiment",
      "aspect": "asp_meaningful-comparison",
      "polarity": "pol_negative"
    },
    {
      "review_id": "BJlxMu4a37",
      "sentence_index": 22,
      "text": "3. I will encourage investigation on a more varied set of tasks.",
      "suffix": "",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_experiment",
      "aspect": "asp_meaningful-comparison",
      "polarity": "pol_negative"
    },
    {
      "review_id": "BJlxMu4a37",
      "sentence_index": 23,
      "text": "Perhaps, also using some MuJoCo environments, or 3D navigation environments.",
      "suffix": "",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_experiment",
      "aspect": "asp_meaningful-comparison",
      "polarity": "pol_negative"
    },
    {
      "review_id": "BJlxMu4a37",
      "sentence_index": 24,
      "text": "Table 1 tries to provide some comparisons on Atari, however number of samples is different for different methods making the comparisons invalid.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_meaningful-comparison",
      "polarity": "pol_negative"
    },
    {
      "review_id": "BJlxMu4a37",
      "sentence_index": 25,
      "text": "Additionally, all of these are still on Atari.",
      "suffix": "\n\n",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_substance",
      "polarity": "pol_negative"
    },
    {
      "review_id": "BJlxMu4a37",
      "sentence_index": 26,
      "text": "[A] Diversity is All You Need: Learning Skills without a Reward Function Benjamin Eysenbach, Abhishek Gupta, Julian Ibarz, Sergey Levine",
      "suffix": "\n\n",
      "review_action": "arg_other",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "BJlxMu4a37",
      "sentence_index": 27,
      "text": "[B] EX2: Exploration with Exemplar Models for Deep Reinforcement Learning Justin Fu, John D. Co-Reyes, Sergey Levine",
      "suffix": "\n\n",
      "review_action": "arg_other",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "BJlxMu4a37",
      "sentence_index": 28,
      "text": "[C] Curiosity-driven Exploration by Self-supervised Prediction Deepak Pathak, Pulkit Agrawal, Alexei A. Efros and Trevor Darrell International Conference on Machine Learning (ICML), 2017",
      "suffix": "",
      "review_action": "arg_other",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    }
  ],
  "rebuttal_sentences": [
    {
      "review_id": "BJlxMu4a37",
      "rebuttal_id": "SyeVGit-kE",
      "sentence_index": 0,
      "text": "Due to the overlap between reviewer comments, we decided to address all concerns in a single response (please see above).",
      "suffix": "",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_summary",
      "alignment": [
        "context_global",
        null
      ],
      "details": {}
    }
  ]
}