{
  "metadata": {
    "forum_id": "HkgqFiAcFm",
    "review_id": "SJgG0gkChX",
    "rebuttal_id": "S1eUaqUjTQ",
    "title": "Marginal Policy Gradients: A Unified Family of Estimators for Bounded Action Spaces with Applications",
    "reviewer": "AnonReviewer3",
    "rating": 7,
    "conference": "ICLR2019",
    "permalink": "https://openreview.net/forum?id=HkgqFiAcFm&noteId=S1eUaqUjTQ",
    "annotator": "anno10"
  },
  "review_sentences": [
    {
      "review_id": "SJgG0gkChX",
      "sentence_index": 0,
      "text": "In this paper the authors proposed a new policy gradient method, which is known as the angular policy gradient (APG), that aims to provide provably lower variance in the gradient estimate.",
      "suffix": "",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "SJgG0gkChX",
      "sentence_index": 1,
      "text": "Here they presented a stochastic policy gradient method for directional control.",
      "suffix": "",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "SJgG0gkChX",
      "sentence_index": 2,
      "text": "Under the set of parameterized Gaussian policies, they presented a unified analysis of the variance of APG and showed how it theoretically outperform (in terms of having lower variance) than other state-of-the art methods.",
      "suffix": "",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "SJgG0gkChX",
      "sentence_index": 3,
      "text": "They further evaluated the APG algorithms on a grid-world navigation domain as well as the King of Glory task, and showed that the APG estimator significantly out-performs the standard policy gradient.",
      "suffix": "\n\n",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "SJgG0gkChX",
      "sentence_index": 4,
      "text": "In general I think this paper addressed an important issue in policy gradient in terms of deriving a lower variance gradient estimate.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_motivation-impact",
      "polarity": "pol_positive"
    },
    {
      "review_id": "SJgG0gkChX",
      "sentence_index": 5,
      "text": "In particular the authors showed that under the parameterized marginal distribution, such as the angular Gaussian distribution, the corresponding APG estimate has a lower variance estimate than that of CAPG.",
      "suffix": "",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "SJgG0gkChX",
      "sentence_index": 6,
      "text": "Furthermore, I also appreciate that they evaluated these results in realistic experiments such as the RTS game domains.",
      "suffix": "\n\n",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_substance",
      "polarity": "pol_positive"
    },
    {
      "review_id": "SJgG0gkChX",
      "sentence_index": 7,
      "text": "My only question is on the possibility of deriving realistic APG algorithms beyond the class of angular Gaussian policy.",
      "suffix": "",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_explanation",
      "aspect": "arg_other",
      "polarity": "pol_negative"
    },
    {
      "review_id": "SJgG0gkChX",
      "sentence_index": 8,
      "text": "In terms of the layout of the paper, I would also recommend including the exact algorithm pseudo-code used in the main paper.",
      "suffix": "",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_edit",
      "aspect": "asp_replicability",
      "polarity": "pol_negative"
    }
  ],
  "rebuttal_sentences": [
    {
      "review_id": "SJgG0gkChX",
      "rebuttal_id": "S1eUaqUjTQ",
      "sentence_index": 0,
      "text": "Thank you for the time and effort spent reviewing our paper.",
      "suffix": "",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_social",
      "alignment": [
        "context_global",
        null
      ],
      "details": {}
    },
    {
      "review_id": "SJgG0gkChX",
      "rebuttal_id": "S1eUaqUjTQ",
      "sentence_index": 1,
      "text": "We are glad you liked the paper.",
      "suffix": "",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_social",
      "alignment": [
        "context_global",
        null
      ],
      "details": {}
    },
    {
      "review_id": "SJgG0gkChX",
      "rebuttal_id": "S1eUaqUjTQ",
      "sentence_index": 2,
      "text": "We want to emphasize one point that we perhaps did not highlight enough in our paper: there are other existing algorithms that fall into the marginal policy gradients framework.",
      "suffix": "",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_summary",
      "alignment": [
        "context_none",
        null
      ],
      "details": {}
    },
    {
      "review_id": "SJgG0gkChX",
      "rebuttal_id": "S1eUaqUjTQ",
      "sentence_index": 3,
      "text": "Specifically, researchers and practitioners both almost always clip actions for use in robotics control environments (read: MuJoCo tasks).",
      "suffix": "",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_summary",
      "alignment": [
        "context_none",
        null
      ],
      "details": {}
    },
    {
      "review_id": "SJgG0gkChX",
      "rebuttal_id": "S1eUaqUjTQ",
      "sentence_index": 4,
      "text": "Recently, a reduced variance method was introduced by Fujita and Maeda (2018) for clipped action spaces.",
      "suffix": "",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_summary",
      "alignment": [
        "context_none",
        null
      ],
      "details": {}
    },
    {
      "review_id": "SJgG0gkChX",
      "rebuttal_id": "S1eUaqUjTQ",
      "sentence_index": 5,
      "text": "Their algorithm is also a member of the marginal policy gradients family and our theoretical results for MPG significantly tighten existing analyses of variance reduction that can be achieved for clipped actions.",
      "suffix": "\n\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_summary",
      "alignment": [
        "context_none",
        null
      ],
      "details": {}
    },
    {
      "review_id": "SJgG0gkChX",
      "rebuttal_id": "S1eUaqUjTQ",
      "sentence_index": 6,
      "text": "To respond to your question, yes it is possible (e.g. the example given above), but their is no general procedure that we know of to derive such methods.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          7
        ]
      ],
      "details": {}
    },
    {
      "review_id": "SJgG0gkChX",
      "rebuttal_id": "S1eUaqUjTQ",
      "sentence_index": 7,
      "text": "Rather, this would be done on an action space by action space basis",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          7
        ]
      ],
      "details": {}
    }
  ]
}