{
  "metadata": {
    "forum_id": "HygSq3VFvH",
    "review_id": "Hkgcm01WqB",
    "rebuttal_id": "HJeVtBG5sB",
    "title": "Self-Supervised State-Control through Intrinsic Mutual Information Rewards",
    "reviewer": "AnonReviewer2",
    "rating": 3,
    "conference": "ICLR2020",
    "permalink": "https://openreview.net/forum?id=HygSq3VFvH&noteId=HJeVtBG5sB",
    "annotator": "anno13"
  },
  "review_sentences": [
    {
      "review_id": "Hkgcm01WqB",
      "sentence_index": 0,
      "text": "The paper paper proposes a mutual information maximization objective for discovering unsupervised robotic manipulation skills.",
      "suffix": "",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "Hkgcm01WqB",
      "sentence_index": 1,
      "text": "The paper assumes that the state space can be divided into two parts - the state of the robot (\u201ccontext states\u201d) which is controllable via actions and the state of an object (\u201cstates of interest\u201d) which must be manipulated by the robot.",
      "suffix": "",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "Hkgcm01WqB",
      "sentence_index": 2,
      "text": "Given these two categories of states, the proposed algorithm maximizes a lower bound on the mutual information between the two categories of states such that a policy is learnt that is able to manipulate the object with the robot meaningfully.",
      "suffix": "\n\n",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "Hkgcm01WqB",
      "sentence_index": 3,
      "text": "I vote for weak reject as (1) the paper makes a strong assumption about the availability of both the robot and object states which is not realistic in typical robotic manipulation applications and (2) the objective in the paper will not work if there is no notion of an \u201cobject\u201d or object-state e.g.: this algorithm will not learn skills for a robot trying to control itself; hence, it is not truly a general purpose skill discovery algorithm but rather a skill discovery algorithm specifically meant for robot-object manipulation tasks.",
      "suffix": "\n\n",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_substance",
      "polarity": "pol_negative"
    },
    {
      "review_id": "Hkgcm01WqB",
      "sentence_index": 4,
      "text": "My main concern with the paper is its limited applicability to robotic manipulation tasks with a clear divide between states of interest vs others.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_motivation-impact",
      "polarity": "pol_negative"
    },
    {
      "review_id": "Hkgcm01WqB",
      "sentence_index": 5,
      "text": "The paper does not talk about settings where states of interest are not known, so all of the experiments are based on this strong assumption.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_substance",
      "polarity": "pol_negative"
    },
    {
      "review_id": "Hkgcm01WqB",
      "sentence_index": 6,
      "text": "It doesn\u2019t seem like a surprising discovery that maximizing the mutual information between the robot state and object state will lead to skills that actually make the robot move the object.",
      "suffix": "\n\n",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_substance",
      "polarity": "pol_negative"
    },
    {
      "review_id": "Hkgcm01WqB",
      "sentence_index": 7,
      "text": "Given that object manipulation is the specific application of interest, the comparison with DIAYN and the combined objective with DIAYN is interesting but little motivation or discussion has been provided in the paper.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_meaningful-comparison",
      "polarity": "pol_negative"
    },
    {
      "review_id": "Hkgcm01WqB",
      "sentence_index": 8,
      "text": "Can the authors elaborate on why this choice should intuitively be better than the proposed method alone?",
      "suffix": "\n\n",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_edit",
      "aspect": "asp_meaningful-comparison",
      "polarity": "none"
    },
    {
      "review_id": "Hkgcm01WqB",
      "sentence_index": 9,
      "text": "The paper does not talk about how these skills can be used as primitive actions by a higher level controller (in a hierarchical RL setup), which would help in demonstrating the usefulness of these skills - e.g.: are these skills sequentially composable such that they can solve a complex task?",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_substance",
      "polarity": "none"
    }
  ],
  "rebuttal_sentences": [
    {
      "review_id": "Hkgcm01WqB",
      "rebuttal_id": "HJeVtBG5sB",
      "sentence_index": 0,
      "text": "Thank you for the comments!",
      "suffix": "\n\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_social",
      "alignment": [
        "context_global",
        null
      ],
      "details": {}
    },
    {
      "review_id": "Hkgcm01WqB",
      "rebuttal_id": "HJeVtBG5sB",
      "sentence_index": 1,
      "text": "To reviewer\u2019s concerns:",
      "suffix": "\n\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_in-rebuttal",
        null
      ],
      "details": {}
    },
    {
      "review_id": "Hkgcm01WqB",
      "rebuttal_id": "HJeVtBG5sB",
      "sentence_index": 2,
      "text": "- First of all, the state of interest does not have to be the object state.",
      "suffix": "",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_contradict-assertion",
      "alignment": [
        "context_sentences",
        [
          1,
          2
        ]
      ],
      "details": {}
    },
    {
      "review_id": "Hkgcm01WqB",
      "rebuttal_id": "HJeVtBG5sB",
      "sentence_index": 3,
      "text": "It can be the state of the robot, for example, the state of actuators.",
      "suffix": "",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_contradict-assertion",
      "alignment": [
        "context_sentences",
        [
          1,
          2
        ]
      ],
      "details": {}
    },
    {
      "review_id": "Hkgcm01WqB",
      "rebuttal_id": "HJeVtBG5sB",
      "sentence_index": 4,
      "text": "Maximizing the mutual information between two sets of actuator states can help the agent to learn to control itself.",
      "suffix": "",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_contradict-assertion",
      "alignment": [
        "context_sentences",
        [
          1,
          2
        ]
      ],
      "details": {}
    },
    {
      "review_id": "Hkgcm01WqB",
      "rebuttal_id": "HJeVtBG5sB",
      "sentence_index": 5,
      "text": "We did a new experiment in navigation environments, where train the agent to maximize the mutual information between its left wheel states and its right wheel states.",
      "suffix": "",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_contradict-assertion",
      "alignment": [
        "context_sentences",
        [
          1,
          2
        ]
      ],
      "details": {}
    },
    {
      "review_id": "Hkgcm01WqB",
      "rebuttal_id": "HJeVtBG5sB",
      "sentence_index": 6,
      "text": "The agent learns to run in a straight line instead of in random directions.",
      "suffix": "",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_contradict-assertion",
      "alignment": [
        "context_sentences",
        [
          1,
          2
        ]
      ],
      "details": {}
    },
    {
      "review_id": "Hkgcm01WqB",
      "rebuttal_id": "HJeVtBG5sB",
      "sentence_index": 7,
      "text": "The video showing experiment results is available at https://youtu.be/l5KaYJWWu70?t=134",
      "suffix": "\n\n",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_contradict-assertion",
      "alignment": [
        "context_sentences",
        [
          1,
          2
        ]
      ],
      "details": {}
    },
    {
      "review_id": "Hkgcm01WqB",
      "rebuttal_id": "HJeVtBG5sB",
      "sentence_index": 8,
      "text": "- Although we evaluated our method in robotic manipulation tasks, it does not mean it won\u2019t work for other tasks.",
      "suffix": "",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_mitigate-criticism",
      "alignment": [
        "context_sentences",
        [
          4,
          5,
          6
        ]
      ],
      "details": {}
    },
    {
      "review_id": "Hkgcm01WqB",
      "rebuttal_id": "HJeVtBG5sB",
      "sentence_index": 9,
      "text": "We added additional experiments in a new navigation task, see the video at https://youtu.be/l5KaYJWWu70?t=104",
      "suffix": "\n",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_mitigate-criticism",
      "alignment": [
        "context_sentences",
        [
          4,
          5,
          6
        ]
      ],
      "details": {}
    },
    {
      "review_id": "Hkgcm01WqB",
      "rebuttal_id": "HJeVtBG5sB",
      "sentence_index": 10,
      "text": "We consider our algorithm as a general-purpose skill learning algorithm in the sense that it guides the agent to learn any skills to control the states of interests.",
      "suffix": "",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_mitigate-criticism",
      "alignment": [
        "context_sentences",
        [
          4,
          5,
          6
        ]
      ],
      "details": {}
    },
    {
      "review_id": "Hkgcm01WqB",
      "rebuttal_id": "HJeVtBG5sB",
      "sentence_index": 11,
      "text": "The states of interest could be any states, such as the robot states, the object states, or the states of the environment.",
      "suffix": "\n\n",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_mitigate-criticism",
      "alignment": [
        "context_sentences",
        [
          4,
          5,
          6
        ]
      ],
      "details": {}
    },
    {
      "review_id": "Hkgcm01WqB",
      "rebuttal_id": "HJeVtBG5sB",
      "sentence_index": 12,
      "text": "- The state of interest is specified by the user with little domain knowledge.",
      "suffix": "",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_mitigate-criticism",
      "alignment": [
        "context_sentences",
        [
          5
        ]
      ],
      "details": {}
    },
    {
      "review_id": "Hkgcm01WqB",
      "rebuttal_id": "HJeVtBG5sB",
      "sentence_index": 13,
      "text": "However, when there is no clear divide from the user, the agent can learn from different combinations of the states of interest and the context states.",
      "suffix": "",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_mitigate-criticism",
      "alignment": [
        "context_sentences",
        [
          5
        ]
      ],
      "details": {}
    },
    {
      "review_id": "Hkgcm01WqB",
      "rebuttal_id": "HJeVtBG5sB",
      "sentence_index": 14,
      "text": "In the end, the user can choose skills from the learned skill sets that are useful for the task at hand.",
      "suffix": "\n\n",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_mitigate-criticism",
      "alignment": [
        "context_sentences",
        [
          5
        ]
      ],
      "details": {}
    },
    {
      "review_id": "Hkgcm01WqB",
      "rebuttal_id": "HJeVtBG5sB",
      "sentence_index": 15,
      "text": "- The combination of our method and DIAYN enables DIAYN to learn manipulation skills efficiently, while DIAYN alone did not learn.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          7,
          8
        ]
      ],
      "details": {}
    },
    {
      "review_id": "Hkgcm01WqB",
      "rebuttal_id": "HJeVtBG5sB",
      "sentence_index": 16,
      "text": "Furthermore, compared to MISC, the combined method enjoys the benefits brought by DIAYN, such as learning combinable motion primitive with skill-conditioned policy for hierarchical reinforcement learning [1].",
      "suffix": "\n\n",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          7,
          8
        ]
      ],
      "details": {}
    },
    {
      "review_id": "Hkgcm01WqB",
      "rebuttal_id": "HJeVtBG5sB",
      "sentence_index": 17,
      "text": "Reference:",
      "suffix": "\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_in-rebuttal",
        null
      ],
      "details": {}
    },
    {
      "review_id": "Hkgcm01WqB",
      "rebuttal_id": "HJeVtBG5sB",
      "sentence_index": 18,
      "text": "[1] Benjamin Eysenbach, Abhishek Gupta, Julian Ibarz, and Sergey Levine. Diversity is all you need: Learning skills without a reward function. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=SJx63jRqFm.",
      "suffix": "",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_in-rebuttal",
        null
      ],
      "details": {}
    }
  ]
}