{
  "metadata": {
    "forum_id": "SJl98sR5tX",
    "review_id": "B1egtgpd37",
    "rebuttal_id": "ryxlCO0vC7",
    "title": "Interactive Agent Modeling by Learning to Probe",
    "reviewer": "AnonReviewer2",
    "rating": 6,
    "conference": "ICLR2019",
    "permalink": "https://openreview.net/forum?id=SJl98sR5tX&noteId=ryxlCO0vC7",
    "annotator": "anno0"
  },
  "review_sentences": [
    {
      "review_id": "B1egtgpd37",
      "sentence_index": 0,
      "text": "The submission proposes a new method for agent design to learn about the behaviour of other fixed agents inhabiting the same environment.",
      "suffix": "",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "B1egtgpd37",
      "sentence_index": 1,
      "text": "The method builds on imitation learning (behavioural cloning) to model the agent\u2019s behaviour and reinforcement learning to learn a probing policy to more broadly explore different target agent behaviours.",
      "suffix": "",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "B1egtgpd37",
      "sentence_index": 2,
      "text": "Overall, the approach falls into the field of intrinsic motivation / curiosity-like reward generation procedures but with respect to target agent behaviour instead of the agent\u2019s environment.",
      "suffix": "",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "B1egtgpd37",
      "sentence_index": 3,
      "text": "While learning to model the target agent\u2019s inner state, the RL reward is generated based on the difference of the target agent\u2019s inner state between consecutive time steps.",
      "suffix": "\n\n",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "B1egtgpd37",
      "sentence_index": 4,
      "text": "The approach is evaluated against a small set of baselines in various toy grid-world scenarios and a sorting task and overall performs commensurate or better than the investigated baselines.",
      "suffix": "",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "B1egtgpd37",
      "sentence_index": 5,
      "text": "Given its limitation to small and low-dimensional environments, it cannot be said how well the approach will scale with respect to these factors and the resulting, more complex agent behaviours.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_soundness-correctness",
      "polarity": "pol_negative"
    },
    {
      "review_id": "B1egtgpd37",
      "sentence_index": 6,
      "text": "It would be highly beneficial to evaluate these aspects.",
      "suffix": "",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_experiment",
      "aspect": "asp_soundness-correctness",
      "polarity": "pol_positive"
    },
    {
      "review_id": "B1egtgpd37",
      "sentence_index": 7,
      "text": "Furthermore, it would be beneficial to provide more information about the baselines; in particular the type of count-based exploration.",
      "suffix": "",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_clarification",
      "aspect": "asp_substance",
      "polarity": "none"
    },
    {
      "review_id": "B1egtgpd37",
      "sentence_index": 8,
      "text": "For the generated figures, it would be beneficial to include standard deviation and mean over multiple runs to not only evaluate performance but also robustness.",
      "suffix": "\n\n",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_edit",
      "aspect": "asp_substance",
      "polarity": "none"
    },
    {
      "review_id": "B1egtgpd37",
      "sentence_index": 9,
      "text": "Overall, while the agent behaviour modelling focused on a type of inner state (based on past trajectories) provides benefits in the evaluated examples, it is unsure how well the approach scales to more complex domains based on strong similarity and simplicity of the tested toy scenarios (evaluation on sorting problems is an interesting step towards to address this shortcoming).",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_soundness-correctness",
      "polarity": "pol_negative"
    },
    {
      "review_id": "B1egtgpd37",
      "sentence_index": 10,
      "text": "One additional aspect pointing towards the necessity of further evaluation is the strong dependence of performance on the dimensionality of the latent, internal state (Fig.4).",
      "suffix": "\n\n",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_soundness-correctness",
      "polarity": "none"
    },
    {
      "review_id": "B1egtgpd37",
      "sentence_index": 11,
      "text": "Minor issues:",
      "suffix": "\n",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_heading",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "B1egtgpd37",
      "sentence_index": 12,
      "text": "- Reward formulations for the baselines as part of the appendix.",
      "suffix": "\n",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_edit",
      "aspect": "asp_replicability",
      "polarity": "none"
    },
    {
      "review_id": "B1egtgpd37",
      "sentence_index": 13,
      "text": "- Same scale for the y-axes across figures",
      "suffix": "",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_edit",
      "aspect": "asp_clarity",
      "polarity": "none"
    }
  ],
  "rebuttal_sentences": [
    {
      "review_id": "B1egtgpd37",
      "rebuttal_id": "ryxlCO0vC7",
      "sentence_index": 0,
      "text": "Thank you for your reviews and comments.",
      "suffix": "",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_social",
      "alignment": [
        "context_global",
        null
      ],
      "details": {}
    },
    {
      "review_id": "B1egtgpd37",
      "rebuttal_id": "ryxlCO0vC7",
      "sentence_index": 1,
      "text": "We respond to your questions as follows.",
      "suffix": "\n\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_in-rebuttal",
        null
      ],
      "details": {}
    },
    {
      "review_id": "B1egtgpd37",
      "rebuttal_id": "ryxlCO0vC7",
      "sentence_index": 2,
      "text": "1. Scalability?",
      "suffix": "\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_sentences",
        [
          5,
          6
        ]
      ],
      "details": {}
    },
    {
      "review_id": "B1egtgpd37",
      "rebuttal_id": "ryxlCO0vC7",
      "sentence_index": 3,
      "text": "While we agree that the tasks in this paper are not real world problems, we think, as a first step towards this direction, the evaluations in this paper have provided some promising proof-of-concept results. Applying the approach to more realistic and more complex tasks could be a good future research direction.",
      "suffix": "\n\n",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_mitigate-criticism",
      "alignment": [
        "context_sentences",
        [
          5,
          6
        ]
      ],
      "details": {}
    },
    {
      "review_id": "B1egtgpd37",
      "rebuttal_id": "ryxlCO0vC7",
      "sentence_index": 4,
      "text": "2. It would be beneficial to provide more information about the baselines",
      "suffix": "\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_sentences",
        [
          7
        ]
      ],
      "details": {}
    },
    {
      "review_id": "B1egtgpd37",
      "rebuttal_id": "ryxlCO0vC7",
      "sentence_index": 5,
      "text": "We have added details of baselines including their reward functions in Appendix E.",
      "suffix": "\n\n",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_done",
      "alignment": [
        "context_sentences",
        [
          7
        ]
      ],
      "details": {
        "request_out_of_scope": true
      }
    },
    {
      "review_id": "B1egtgpd37",
      "rebuttal_id": "ryxlCO0vC7",
      "sentence_index": 6,
      "text": "3. For the generated figures, it would be beneficial to include standard deviation and mean over multiple runs",
      "suffix": "\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_sentences",
        [
          8
        ]
      ],
      "details": {}
    },
    {
      "review_id": "B1egtgpd37",
      "rebuttal_id": "ryxlCO0vC7",
      "sentence_index": 7,
      "text": "We show the standard deviation of multiple runs in Figure 7,8,9 in the revision.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_done",
      "alignment": [
        "context_sentences",
        [
          8
        ]
      ],
      "details": {
        "request_out_of_scope": true
      }
    },
    {
      "review_id": "B1egtgpd37",
      "rebuttal_id": "ryxlCO0vC7",
      "sentence_index": 8,
      "text": "We have done our best to evaluate the robustness given the limited time and will continue to improve the evaluation.",
      "suffix": "\n\n",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_done",
      "alignment": [
        "context_sentences",
        [
          8
        ]
      ],
      "details": {
        "request_out_of_scope": true
      }
    },
    {
      "review_id": "B1egtgpd37",
      "rebuttal_id": "ryxlCO0vC7",
      "sentence_index": 9,
      "text": "4. The strong dependence of performance on the dimensionality of the latent, internal state (Fig.4).",
      "suffix": "\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_sentences",
        [
          10
        ]
      ],
      "details": {}
    },
    {
      "review_id": "B1egtgpd37",
      "rebuttal_id": "ryxlCO0vC7",
      "sentence_index": 10,
      "text": "The network architecture design is not the focus of our paper.",
      "suffix": "",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_reject-criticism",
      "alignment": [
        "context_sentences",
        [
          10
        ]
      ],
      "details": {}
    },
    {
      "review_id": "B1egtgpd37",
      "rebuttal_id": "ryxlCO0vC7",
      "sentence_index": 11,
      "text": "Generally speaking, a higher dimensionality of the latent vector provides a more powerful network to model agents.",
      "suffix": "",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_reject-criticism",
      "alignment": [
        "context_sentences",
        [
          10
        ]
      ],
      "details": {}
    },
    {
      "review_id": "B1egtgpd37",
      "rebuttal_id": "ryxlCO0vC7",
      "sentence_index": 12,
      "text": "However, as we show in Figure 4, with probing, the network with lower dimensionality can even outperform the baselines trained with latent vectors that have higher dimensions.",
      "suffix": "",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_reject-criticism",
      "alignment": [
        "context_sentences",
        [
          10
        ]
      ],
      "details": {}
    },
    {
      "review_id": "B1egtgpd37",
      "rebuttal_id": "ryxlCO0vC7",
      "sentence_index": 13,
      "text": "And with the same architecture, probing clearly provides a significant improvement.",
      "suffix": "\n\n",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_reject-criticism",
      "alignment": [
        "context_sentences",
        [
          10
        ]
      ],
      "details": {}
    },
    {
      "review_id": "B1egtgpd37",
      "rebuttal_id": "ryxlCO0vC7",
      "sentence_index": 14,
      "text": "5. Minor issues.",
      "suffix": "\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_sentences",
        [
          11,
          12,
          13
        ]
      ],
      "details": {}
    },
    {
      "review_id": "B1egtgpd37",
      "rebuttal_id": "ryxlCO0vC7",
      "sentence_index": 15,
      "text": "Thanks for pointing out these issues. We have fixed them in the revision.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_done",
      "alignment": [
        "context_sentences",
        [
          11,
          12,
          13
        ]
      ],
      "details": {
        "request_out_of_scope": true
      }
    }
  ]
}