{
  "metadata": {
    "forum_id": "B1ldb6NKDr",
    "review_id": "rJgHSAbAFH",
    "rebuttal_id": "HyeW9eY5oB",
    "title": "Multi-Agent Hierarchical Reinforcement Learning for Humanoid Navigation",
    "reviewer": "AnonReviewer3",
    "rating": 3,
    "conference": "ICLR2020",
    "permalink": "https://openreview.net/forum?id=B1ldb6NKDr&noteId=HyeW9eY5oB",
    "annotator": "anno7"
  },
  "review_sentences": [
    {
      "review_id": "rJgHSAbAFH",
      "sentence_index": 0,
      "text": "This paper proposes a multi-agent hierarchical reinforcement learning algorithm so that multiple humanoid robots can navigate in multi-agent settings (e.g. avoid collisions, collaboration, chase and escape) in a physically simulated environment.",
      "suffix": "",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "rJgHSAbAFH",
      "sentence_index": 1,
      "text": "The key difference of this paper with the prior work on MARL is that it used an accurate physics simulation of humanoid robots.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_originality",
      "polarity": "pol_positive"
    },
    {
      "review_id": "rJgHSAbAFH",
      "sentence_index": 2,
      "text": "This is the main reason of using the hierarchical RL.",
      "suffix": "\n\n",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_motivation-impact",
      "polarity": "pol_positive"
    },
    {
      "review_id": "rJgHSAbAFH",
      "sentence_index": 3,
      "text": "In general, I like this paper.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_motivation-impact",
      "polarity": "pol_positive"
    },
    {
      "review_id": "rJgHSAbAFH",
      "sentence_index": 4,
      "text": "It is an important step towards multi-agent learning in complex physical environments.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_originality",
      "polarity": "pol_positive"
    },
    {
      "review_id": "rJgHSAbAFH",
      "sentence_index": 5,
      "text": "The results look appealing, too.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_soundness-correctness",
      "polarity": "pol_positive"
    },
    {
      "review_id": "rJgHSAbAFH",
      "sentence_index": 6,
      "text": "However, I voted for \"Weak Reject\" for two reasons.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "arg_other",
      "polarity": "pol_negative"
    },
    {
      "review_id": "rJgHSAbAFH",
      "sentence_index": 7,
      "text": "First, the technical contribution is lean. Neither the multi-agent learning or the hierarchical learning of the algorithm is novel.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_soundness-correctness",
      "polarity": "pol_negative"
    },
    {
      "review_id": "rJgHSAbAFH",
      "sentence_index": 8,
      "text": "The combination of these two methods seems straightforward.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_replicability",
      "polarity": "pol_negative"
    },
    {
      "review_id": "rJgHSAbAFH",
      "sentence_index": 9,
      "text": "Once a low-level walking controller is trained, the high-level multi-agent navigation control is not much different from simple environments, e.g. point mass control, used in the previous works.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_substance",
      "polarity": "pol_negative"
    },
    {
      "review_id": "rJgHSAbAFH",
      "sentence_index": 10,
      "text": "I do not understand the \"deep integration of MARL and HRL\" that is claimed in the Introduction.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_clarity",
      "polarity": "pol_negative"
    },
    {
      "review_id": "rJgHSAbAFH",
      "sentence_index": 11,
      "text": "I also do not agree with another claim that \"We consider the simulation and training environment to be another novel contribution... few simulator support more than one agent, at most 2\".",
      "suffix": "",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_explanation",
      "aspect": "asp_substance",
      "polarity": "pol_negative"
    },
    {
      "review_id": "rJgHSAbAFH",
      "sentence_index": 12,
      "text": "In most of the simulators that I am familiar with, such as Mujoco, Bullet, DART, it is straightforward to add multiple simulated robots.",
      "suffix": "\n\n",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_substance",
      "polarity": "pol_negative"
    },
    {
      "review_id": "rJgHSAbAFH",
      "sentence_index": 13,
      "text": "Second, the writing can be greatly improved.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_clarity",
      "polarity": "pol_negative"
    },
    {
      "review_id": "rJgHSAbAFH",
      "sentence_index": 14,
      "text": "Almost half of the technical details are buried in \"8. Supplementary material\".",
      "suffix": "",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_edit",
      "aspect": "asp_substance",
      "polarity": "pol_negative"
    },
    {
      "review_id": "rJgHSAbAFH",
      "sentence_index": 15,
      "text": "Since it is not fair to use \"Supplementary material\" as a way to extend the page limit, I will make my judgement of the paper solely based on the contents up to Section 7.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_meaningful-comparison",
      "polarity": "pol_negative"
    },
    {
      "review_id": "rJgHSAbAFH",
      "sentence_index": 16,
      "text": "In the main text (up to Section 7), there is no mentioning of how the low-level controllers are learned, and how to combine PPO in a MARL partial parameter sharing setting.",
      "suffix": "",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_experiment",
      "aspect": "asp_soundness-correctness",
      "polarity": "pol_negative"
    },
    {
      "review_id": "rJgHSAbAFH",
      "sentence_index": 17,
      "text": "I think that these are important details and may also be the contributions of this paper.",
      "suffix": "",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_clarification",
      "aspect": "asp_substance",
      "polarity": "pol_negative"
    },
    {
      "review_id": "rJgHSAbAFH",
      "sentence_index": 18,
      "text": "Most of these should be moved to the main text.",
      "suffix": "\n\n",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_edit",
      "aspect": "asp_substance",
      "polarity": "pol_negative"
    },
    {
      "review_id": "rJgHSAbAFH",
      "sentence_index": 19,
      "text": "Here are some more suggestions on writing:",
      "suffix": "\n",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_heading",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "rJgHSAbAFH",
      "sentence_index": 20,
      "text": "1) Certain paragraphs in the main text can be significantly shortened, such as the reward shaping in Section 5.2.",
      "suffix": "\n",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_edit",
      "aspect": "asp_substance",
      "polarity": "pol_negative"
    },
    {
      "review_id": "rJgHSAbAFH",
      "sentence_index": 21,
      "text": "2) It would be great if the paper can clearly define the experiments: \"waypoint\", \"oncoming\", \"mall\", and \"bottleneck\".",
      "suffix": "\n",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_edit",
      "aspect": "asp_clarity",
      "polarity": "pol_negative"
    },
    {
      "review_id": "rJgHSAbAFH",
      "sentence_index": 22,
      "text": "3) The paper needs a thorough proof-reading. There are many grammar mistakes, typos, missing citations. For example,",
      "suffix": "\n",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_typo",
      "aspect": "asp_substance",
      "polarity": "pol_negative"
    },
    {
      "review_id": "rJgHSAbAFH",
      "sentence_index": 23,
      "text": "promiss->promise",
      "suffix": "\n",
      "review_action": "none",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "rJgHSAbAFH",
      "sentence_index": 24,
      "text": "week signal->weak signal",
      "suffix": "\n",
      "review_action": "none",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "rJgHSAbAFH",
      "sentence_index": 25,
      "text": "missing citation [?] in page 3",
      "suffix": "\n",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_typo",
      "aspect": "asp_replicability",
      "polarity": "pol_negative"
    },
    {
      "review_id": "rJgHSAbAFH",
      "sentence_index": 26,
      "text": "reuse the same symbol v_{com} for agent's velocity and desired speed in eq(3)",
      "suffix": "",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_typo",
      "aspect": "asp_replicability",
      "polarity": "pol_negative"
    }
  ],
  "rebuttal_sentences": [
    {
      "review_id": "rJgHSAbAFH",
      "rebuttal_id": "HyeW9eY5oB",
      "sentence_index": 0,
      "text": "We appreciate your time and comments on the work.",
      "suffix": "",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_social",
      "alignment": [
        "context_none",
        null
      ],
      "details": {}
    },
    {
      "review_id": "rJgHSAbAFH",
      "rebuttal_id": "HyeW9eY5oB",
      "sentence_index": 1,
      "text": "While the method is the first to be applied to a multi-agent simulation with articulated humanoid character, our main contribution is a method to allow sophisticated controllers to be learned in this case.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          7,
          8
        ]
      ],
      "details": {}
    },
    {
      "review_id": "rJgHSAbAFH",
      "rebuttal_id": "HyeW9eY5oB",
      "sentence_index": 2,
      "text": "Our unique combination of structured learning enables the learning of strong polices without incredible amounts of computing time (cite openAI Emergent Tool Use from Multi-Agent Interaction).",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          7,
          8
        ]
      ],
      "details": {}
    },
    {
      "review_id": "rJgHSAbAFH",
      "rebuttal_id": "HyeW9eY5oB",
      "sentence_index": 3,
      "text": "We also argue that the learning and control problem for the high-level policies is more complicated than a \u201cpoint mass\u201d environment.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          9,
          10
        ]
      ],
      "details": {}
    },
    {
      "review_id": "rJgHSAbAFH",
      "rebuttal_id": "HyeW9eY5oB",
      "sentence_index": 4,
      "text": "The high level needs to learn strategies to cope with a dynamic simulation that includes, pushes, slips, balancing, etc, all through the capabilities of a low-level policy while optimizing a goal-seeking objective.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          10,
          11
        ]
      ],
      "details": {}
    },
    {
      "review_id": "rJgHSAbAFH",
      "rebuttal_id": "HyeW9eY5oB",
      "sentence_index": 5,
      "text": "This simulation environment it also novel in that no other simulation has put multiple dynamic humanoid agents in a simulation that observe each other using egocentric vision.",
      "suffix": "",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_mitigate-criticism",
      "alignment": [
        "context_sentences",
        [
          11,
          12
        ]
      ],
      "details": {}
    },
    {
      "review_id": "rJgHSAbAFH",
      "rebuttal_id": "HyeW9eY5oB",
      "sentence_index": 6,
      "text": "In other simulation libraries, it is possible to add more agents, but no tasks have been constructed or learned that match the complexity in this work.",
      "suffix": "\n\n",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_mitigate-criticism",
      "alignment": [
        "context_sentences",
        [
          11,
          12
        ]
      ],
      "details": {}
    },
    {
      "review_id": "rJgHSAbAFH",
      "rebuttal_id": "HyeW9eY5oB",
      "sentence_index": 7,
      "text": "We agree that the paper writing can be improved.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_concede-criticism",
      "alignment": [
        "context_sentences",
        [
          13
        ]
      ],
      "details": {}
    },
    {
      "review_id": "rJgHSAbAFH",
      "rebuttal_id": "HyeW9eY5oB",
      "sentence_index": 8,
      "text": "Significant edits to the paper have been made to make the method and its contribution more clear.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_done",
      "alignment": [
        "context_sentences",
        [
          13
        ]
      ],
      "details": {
        "request_out_of_scope": true
      }
    },
    {
      "review_id": "rJgHSAbAFH",
      "rebuttal_id": "HyeW9eY5oB",
      "sentence_index": 9,
      "text": "These edits include moving the details for training the goal-conditioned low-level controller to the main paper, including adding a new task for 2-on-2 soccer for which MAHRL has shown significant progress on learning.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_done",
      "alignment": [
        "context_sentences",
        [
          13
        ]
      ],
      "details": {
        "request_out_of_scope": true
      }
    }
  ]
}