{
  "metadata": {
    "forum_id": "HkgqFiAcFm",
    "review_id": "H1gSPxyC3Q",
    "rebuttal_id": "SJlQqs8spQ",
    "title": "Marginal Policy Gradients: A Unified Family of Estimators for Bounded Action Spaces with Applications",
    "reviewer": "AnonReviewer2",
    "rating": 6,
    "conference": "ICLR2019",
    "permalink": "https://openreview.net/forum?id=HkgqFiAcFm&noteId=SJlQqs8spQ",
    "annotator": "anno3"
  },
  "review_sentences": [
    {
      "review_id": "H1gSPxyC3Q",
      "sentence_index": 0,
      "text": "This paper introduces policy gradient methods for RL where the policy must choose a direction (a.k.a., the navigation problem).",
      "suffix": "\n\n",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "H1gSPxyC3Q",
      "sentence_index": 1,
      "text": "Mapping techniques from \"non-directional\" problems (where the action space is not a direction) and then projeting on the sphere is sub-optimal (the variance is too big).",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_soundness-correctness",
      "polarity": "pol_negative"
    },
    {
      "review_id": "H1gSPxyC3Q",
      "sentence_index": 2,
      "text": "The authors propose to sample directly on the sphere, using the fact that the likelyhood of an angular Gaussian r.v. has *almost* a closed form and its gradient can almost be computed, up to some normalization term (the integral which is constant in the standard Gaussian case).",
      "suffix": "\n\n\n",
      "review_action": "arg_fact",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "H1gSPxyC3Q",
      "sentence_index": 3,
      "text": "This can be seen as a variance reduction techniques.",
      "suffix": "\n\n",
      "review_action": "arg_fact",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "H1gSPxyC3Q",
      "sentence_index": 4,
      "text": "The proofs are not too intricate, for someone used to variance reduction (yet computations must be made quite carefully).",
      "suffix": "\n\n\n",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_clarity",
      "polarity": "pol_positive"
    },
    {
      "review_id": "H1gSPxyC3Q",
      "sentence_index": 5,
      "text": "The result is coherent, interesting from a theoretical point of view and the experiment are somehow convincing.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_substance",
      "polarity": "pol_positive"
    },
    {
      "review_id": "H1gSPxyC3Q",
      "sentence_index": 6,
      "text": "The main drawback would be the rather incrementality of that paper (basically sample before projecting is a bit better than projecting after sampling) and that this directional setting is quite limited...",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_replicability",
      "polarity": "pol_negative"
    }
  ],
  "rebuttal_sentences": [
    {
      "review_id": "H1gSPxyC3Q",
      "rebuttal_id": "SJlQqs8spQ",
      "sentence_index": 0,
      "text": "Thank you for the time and effort spent reviewing our paper.",
      "suffix": "",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_social",
      "alignment": [
        "context_global",
        null
      ],
      "details": {}
    },
    {
      "review_id": "H1gSPxyC3Q",
      "rebuttal_id": "SJlQqs8spQ",
      "sentence_index": 1,
      "text": "We mostly agree with your characterization of our work, but we think there are two important points we perhaps did not sufficiently emphasize in our paper and that we would like to mention:",
      "suffix": "\n\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_in-rebuttal",
        null
      ],
      "details": {}
    },
    {
      "review_id": "H1gSPxyC3Q",
      "rebuttal_id": "SJlQqs8spQ",
      "sentence_index": 2,
      "text": "(1) There are other existing tasks and algorithms that fall into the marginal policy gradients framework.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_global",
        null
      ],
      "details": {}
    },
    {
      "review_id": "H1gSPxyC3Q",
      "rebuttal_id": "SJlQqs8spQ",
      "sentence_index": 3,
      "text": "For example, researchers and practitioners both almost always clip actions when using policy gradient algorithms for robotics control environments (read: MuJoCo tasks).",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_global",
        null
      ],
      "details": {}
    },
    {
      "review_id": "H1gSPxyC3Q",
      "rebuttal_id": "SJlQqs8spQ",
      "sentence_index": 4,
      "text": "Recently, a reduced variance method was introduced by Fujita and Maeda (2018) for clipped action spaces.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_global",
        null
      ],
      "details": {}
    },
    {
      "review_id": "H1gSPxyC3Q",
      "rebuttal_id": "SJlQqs8spQ",
      "sentence_index": 5,
      "text": "Their algorithm is also a member of the marginal policy gradients family and our theoretical results for MPG significantly tighten the existing analysis of their algorithm.",
      "suffix": "\n\n",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_global",
        null
      ],
      "details": {}
    },
    {
      "review_id": "H1gSPxyC3Q",
      "rebuttal_id": "SJlQqs8spQ",
      "sentence_index": 6,
      "text": "(2) To the best of our knowledge, our work is the first to apply such variance reduction techniques to RL.",
      "suffix": "\n\n",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_reject-criticism",
      "alignment": [
        "context_sentences",
        [
          6
        ]
      ],
      "details": {}
    },
    {
      "review_id": "H1gSPxyC3Q",
      "rebuttal_id": "SJlQqs8spQ",
      "sentence_index": 7,
      "text": "To summarize, our work consists of two components: (a) a new algorithm for directional control and (b) a variance reduction framework that can be applied to directional action space and clipped action spaces.",
      "suffix": "",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_summary",
      "alignment": [
        "context_global",
        null
      ],
      "details": {}
    },
    {
      "review_id": "H1gSPxyC3Q",
      "rebuttal_id": "SJlQqs8spQ",
      "sentence_index": 8,
      "text": "While directional action spaces are not very common at this time, clipped action spaces are extremely common.",
      "suffix": "",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_summary",
      "alignment": [
        "context_global",
        null
      ],
      "details": {}
    },
    {
      "review_id": "H1gSPxyC3Q",
      "rebuttal_id": "SJlQqs8spQ",
      "sentence_index": 9,
      "text": "We also anticipate that in the future, many additional environments will be available that feature directional actions (many console or PC games, for example).",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_future",
      "alignment": [
        "context_in-rebuttal",
        null
      ],
      "details": {}
    },
    {
      "review_id": "H1gSPxyC3Q",
      "rebuttal_id": "SJlQqs8spQ",
      "sentence_index": 10,
      "text": "For these reasons, we feel that our work is not incremental at all, and is actually quite novel.",
      "suffix": "",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_reject-criticism",
      "alignment": [
        "context_sentences",
        [
          6
        ]
      ],
      "details": {}
    }
  ]
}