{
  "metadata": {
    "forum_id": "SJl98sR5tX",
    "review_id": "r1eaIH8raQ",
    "rebuttal_id": "SJexqnAvC7",
    "title": "Interactive Agent Modeling by Learning to Probe",
    "reviewer": "AnonReviewer4",
    "rating": 6,
    "conference": "ICLR2019",
    "permalink": "https://openreview.net/forum?id=SJl98sR5tX&noteId=SJexqnAvC7",
    "annotator": "anno0"
  },
  "review_sentences": [
    {
      "review_id": "r1eaIH8raQ",
      "sentence_index": 0,
      "text": "1) Summary",
      "suffix": "\n",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_heading",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "r1eaIH8raQ",
      "sentence_index": 1,
      "text": "This paper proposes a method for learning an agent by interacting and probing an expert agents behavior.",
      "suffix": "",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "r1eaIH8raQ",
      "sentence_index": 2,
      "text": "This method is composed of a policy that learns to imitate an expert\u2019s action, and a policy that challenges the expert in order to get it to take multiple possible routes to solve a task.",
      "suffix": "",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "r1eaIH8raQ",
      "sentence_index": 3,
      "text": "The two policies share a \u201cbehavior tracker\u201d that models the expert\u2019s behavior, and communicates it to both policies being learned.",
      "suffix": "",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "r1eaIH8raQ",
      "sentence_index": 4,
      "text": "The probing policy is optimized using a curiosity-driven reward in order to get the expert take trajectories the probing policy has not seen before.",
      "suffix": "",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "r1eaIH8raQ",
      "sentence_index": 5,
      "text": "In experiments, the authors perform experiments to show how the learned agent can generalize to unseen configurations in the corresponding environments in which the agents were trained, and also use the proposed technique in a sorting task in which the method generalizes to longer arrays to be sorted.",
      "suffix": "\n\n\n",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "r1eaIH8raQ",
      "sentence_index": 6,
      "text": "2) Pros:",
      "suffix": "\n",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_heading",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "r1eaIH8raQ",
      "sentence_index": 7,
      "text": "+ Neat idea for exploring an experts behavior by changing the environment surrounding it (probing it).",
      "suffix": "\n",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_originality",
      "polarity": "pol_positive"
    },
    {
      "review_id": "r1eaIH8raQ",
      "sentence_index": 8,
      "text": "+ Cool experiments for applicability.",
      "suffix": "\n",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_replicability",
      "polarity": "pol_positive"
    },
    {
      "review_id": "r1eaIH8raQ",
      "sentence_index": 9,
      "text": "+ Well written paper and easy to understand.",
      "suffix": "\n\n",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_clarity",
      "polarity": "pol_positive"
    },
    {
      "review_id": "r1eaIH8raQ",
      "sentence_index": 10,
      "text": "3 Comments:",
      "suffix": "\n",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_heading",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "r1eaIH8raQ",
      "sentence_index": 11,
      "text": "- Equation 1 typo?:",
      "suffix": "\n",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_typo",
      "aspect": "asp_clarity",
      "polarity": "none"
    },
    {
      "review_id": "r1eaIH8raQ",
      "sentence_index": 12,
      "text": "To my understanding, in curiosity driven exploration, the exploration is driven based on how well the next state can be predicted by the agent.",
      "suffix": "",
      "review_action": "arg_fact",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "r1eaIH8raQ",
      "sentence_index": 13,
      "text": "In equation 1, different time steps are being compared, m^t and m^{t-1}, but the comparison should be between the predicted time step t and real time step t. Can the authors clarify why different time steps are compared in the equation?",
      "suffix": "\n\n",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_clarification",
      "aspect": "asp_soundness-correctness",
      "polarity": "none"
    },
    {
      "review_id": "r1eaIH8raQ",
      "sentence_index": 14,
      "text": "- Baseline missing: Random actions from expert",
      "suffix": "\n",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_experiment",
      "aspect": "asp_meaningful-comparison",
      "polarity": "pol_negative"
    },
    {
      "review_id": "r1eaIH8raQ",
      "sentence_index": 15,
      "text": "A simple baseline to compare against could be to simply force the expert to take a few random actions during its trajectory and let the imitator learn from these.",
      "suffix": "",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_experiment",
      "aspect": "asp_meaningful-comparison",
      "polarity": "none"
    },
    {
      "review_id": "r1eaIH8raQ",
      "sentence_index": 16,
      "text": "Comparing against this baseline could serve as evidence that we need to actually learn the probing agent to acquire a more optimal policy.",
      "suffix": "\n\n",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_experiment",
      "aspect": "asp_meaningful-comparison",
      "polarity": "none"
    },
    {
      "review_id": "r1eaIH8raQ",
      "sentence_index": 17,
      "text": "- Baseline missing: Simple RNN policies that communicate hidden states.",
      "suffix": "\n",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_experiment",
      "aspect": "asp_meaningful-comparison",
      "polarity": "pol_negative"
    },
    {
      "review_id": "r1eaIH8raQ",
      "sentence_index": 18,
      "text": "Another baseline could be to simply model the imitator and probing policies as RNNs and let them communicate with each other via the hidden states.",
      "suffix": "",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_experiment",
      "aspect": "asp_meaningful-comparison",
      "polarity": "pol_negative"
    },
    {
      "review_id": "r1eaIH8raQ",
      "sentence_index": 19,
      "text": "While optimizing the curiosity reward the hidden states could be used as well.",
      "suffix": "",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_experiment",
      "aspect": "asp_soundness-correctness",
      "polarity": "none"
    },
    {
      "review_id": "r1eaIH8raQ",
      "sentence_index": 20,
      "text": "If successful, this baseline can show that we actually need to model the \u201cbehavior\u201d with a separate network.",
      "suffix": "\n\n",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_experiment",
      "aspect": "asp_substance",
      "polarity": "none"
    },
    {
      "review_id": "r1eaIH8raQ",
      "sentence_index": 21,
      "text": "- Ablation study for the importance of fusion:",
      "suffix": "\n",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_heading",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "r1eaIH8raQ",
      "sentence_index": 22,
      "text": "The authors have a \u201cfusion\u201d layer within the imitator and probing policies.",
      "suffix": "",
      "review_action": "arg_fact",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "r1eaIH8raQ",
      "sentence_index": 23,
      "text": "An ablation study showing that this layer is actually necessary is missing from the paper.",
      "suffix": "\n\n",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_substance",
      "polarity": "pol_negative"
    },
    {
      "review_id": "r1eaIH8raQ",
      "sentence_index": 24,
      "text": "- Generalizability argument",
      "suffix": "\n",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_heading",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "r1eaIH8raQ",
      "sentence_index": 25,
      "text": "The authors claim that they show a single starting configuration for the agents during training, and different starting configurations during testing.",
      "suffix": "",
      "review_action": "arg_fact",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "r1eaIH8raQ",
      "sentence_index": 26,
      "text": "While I agree with this to some extent, I also think this argument may not be fully right. When the probing agent is testing the expert, it is essentially showing the imitator many different configurations of the environment.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_soundness-correctness",
      "polarity": "pol_negative"
    },
    {
      "review_id": "r1eaIH8raQ",
      "sentence_index": 27,
      "text": "It may not be that it changes in the first time step (for obvious reasons), but it is essentially showing it many configurations of the expert.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_soundness-correctness",
      "polarity": "pol_negative"
    },
    {
      "review_id": "r1eaIH8raQ",
      "sentence_index": 28,
      "text": "A more drastic change of the environment could make for a stronger argument.",
      "suffix": "\n\n\n",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_soundness-correctness",
      "polarity": "pol_negative"
    },
    {
      "review_id": "r1eaIH8raQ",
      "sentence_index": 29,
      "text": "4) Conclusion:",
      "suffix": "\n",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_heading",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "r1eaIH8raQ",
      "sentence_index": 30,
      "text": "Overall, I like the idea of having a policy that tries to figure out the general behavior of a demonstrator by probing it.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_motivation-impact",
      "polarity": "pol_positive"
    },
    {
      "review_id": "r1eaIH8raQ",
      "sentence_index": 31,
      "text": "Having said that, I feel this paper needs to improve in the aspects mentioned above.",
      "suffix": "",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_edit",
      "aspect": "arg_other",
      "polarity": "pol_positive"
    },
    {
      "review_id": "r1eaIH8raQ",
      "sentence_index": 32,
      "text": "If the authors present more convincing evidence that successfully address the comments above, I am willing to increase my score.",
      "suffix": "",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_experiment",
      "aspect": "asp_substance",
      "polarity": "pol_positive"
    }
  ],
  "rebuttal_sentences": [
    {
      "review_id": "r1eaIH8raQ",
      "rebuttal_id": "SJexqnAvC7",
      "sentence_index": 0,
      "text": "Thank you for your detailed reviews and constructive suggestions.",
      "suffix": "",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_social",
      "alignment": [
        "context_global",
        null
      ],
      "details": {}
    },
    {
      "review_id": "r1eaIH8raQ",
      "rebuttal_id": "SJexqnAvC7",
      "sentence_index": 1,
      "text": "We have added the suggested baselines in the revision.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_done",
      "alignment": [
        "context_sentences",
        [
          14,
          15,
          16,
          17,
          18,
          19,
          20
        ]
      ],
      "details": {
        "request_out_of_scope": true
      }
    },
    {
      "review_id": "r1eaIH8raQ",
      "rebuttal_id": "SJexqnAvC7",
      "sentence_index": 2,
      "text": "Here are our responses to your questions and comments:",
      "suffix": "\n\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_global",
        null
      ],
      "details": {}
    },
    {
      "review_id": "r1eaIH8raQ",
      "rebuttal_id": "SJexqnAvC7",
      "sentence_index": 3,
      "text": "1. Equation 1 typo?",
      "suffix": "\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_sentences",
        [
          11
        ]
      ],
      "details": {}
    },
    {
      "review_id": "r1eaIH8raQ",
      "rebuttal_id": "SJexqnAvC7",
      "sentence_index": 4,
      "text": "It is not a typo.",
      "suffix": "",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_contradict-assertion",
      "alignment": [
        "context_sentences",
        [
          11,
          12,
          13
        ]
      ],
      "details": {}
    },
    {
      "review_id": "r1eaIH8raQ",
      "rebuttal_id": "SJexqnAvC7",
      "sentence_index": 5,
      "text": "Our reward function is different from existing curiosity reward.",
      "suffix": "",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_contradict-assertion",
      "alignment": [
        "context_sentences",
        [
          11,
          12,
          13
        ]
      ],
      "details": {}
    },
    {
      "review_id": "r1eaIH8raQ",
      "rebuttal_id": "SJexqnAvC7",
      "sentence_index": 6,
      "text": "We are using the change of the real time m^t and m^{t-1} as the reward for inciting behavioral change from the demonstrator.",
      "suffix": "",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_contradict-assertion",
      "alignment": [
        "context_sentences",
        [
          11,
          12,
          13
        ]
      ],
      "details": {}
    },
    {
      "review_id": "r1eaIH8raQ",
      "rebuttal_id": "SJexqnAvC7",
      "sentence_index": 7,
      "text": "We have shown more analysis and visualization to explain why this works in the new revision (Appendix B.2 & B.3).",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_done",
      "alignment": [
        "context_sentences",
        [
          11,
          12,
          13
        ]
      ],
      "details": {
        "request_out_of_scope": true
      }
    },
    {
      "review_id": "r1eaIH8raQ",
      "rebuttal_id": "SJexqnAvC7",
      "sentence_index": 8,
      "text": "Our \u201cself-supervised\u201d baseline is actually using the prediction loss as reward, and it has a worse performance compared to ours.",
      "suffix": "\n\n",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          11,
          12,
          13
        ]
      ],
      "details": {}
    },
    {
      "review_id": "r1eaIH8raQ",
      "rebuttal_id": "SJexqnAvC7",
      "sentence_index": 9,
      "text": "2. Baseline missing: Random actions from expert",
      "suffix": "\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_sentences",
        [
          14,
          15,
          16
        ]
      ],
      "details": {}
    },
    {
      "review_id": "r1eaIH8raQ",
      "rebuttal_id": "SJexqnAvC7",
      "sentence_index": 10,
      "text": "Figure 5 shows the results where 10% actions from the demonstrator are purely random.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_done",
      "alignment": [
        "context_sentences",
        [
          14,
          15,
          16
        ]
      ],
      "details": {
        "request_out_of_scope": true
      }
    },
    {
      "review_id": "r1eaIH8raQ",
      "rebuttal_id": "SJexqnAvC7",
      "sentence_index": 11,
      "text": "With the randomness, our approach is still be able to find meaningful probing policy.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_done",
      "alignment": [
        "context_sentences",
        [
          14,
          15,
          16
        ]
      ],
      "details": {
        "request_out_of_scope": true
      }
    },
    {
      "review_id": "r1eaIH8raQ",
      "rebuttal_id": "SJexqnAvC7",
      "sentence_index": 12,
      "text": "We have also evaluated the success rate when we use the policy learned from the suboptimal demonstration (10% random actions).",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_done",
      "alignment": [
        "context_sentences",
        [
          14,
          15,
          16
        ]
      ],
      "details": {
        "request_out_of_scope": true
      }
    },
    {
      "review_id": "r1eaIH8raQ",
      "rebuttal_id": "SJexqnAvC7",
      "sentence_index": 13,
      "text": "As reported in the updated Table 1, this policy is comparable to the one learned from optimal demonstrations, and it still outperforms baselines which are all trained from optimal demonstrations.",
      "suffix": "\n\n",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_done",
      "alignment": [
        "context_sentences",
        [
          14,
          15,
          16
        ]
      ],
      "details": {
        "request_out_of_scope": true
      }
    },
    {
      "review_id": "r1eaIH8raQ",
      "rebuttal_id": "SJexqnAvC7",
      "sentence_index": 14,
      "text": "3. Baseline missing: Simple RNN policies that communicate hidden states",
      "suffix": "\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_sentences",
        [
          17,
          18,
          19,
          20
        ]
      ],
      "details": {}
    },
    {
      "review_id": "r1eaIH8raQ",
      "rebuttal_id": "SJexqnAvC7",
      "sentence_index": 15,
      "text": "We have evaluated this baseline in the revision (i.e., the \u201c2-LSTM\u201d baseline).",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_done",
      "alignment": [
        "context_sentences",
        [
          17,
          18,
          19,
          20
        ]
      ],
      "details": {
        "request_out_of_scope": true
      }
    },
    {
      "review_id": "r1eaIH8raQ",
      "rebuttal_id": "SJexqnAvC7",
      "sentence_index": 16,
      "text": "The network architecture is illustrated in Figure 16.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_done",
      "alignment": [
        "context_sentences",
        [
          17,
          18,
          19,
          20
        ]
      ],
      "details": {
        "request_out_of_scope": true
      }
    },
    {
      "review_id": "r1eaIH8raQ",
      "rebuttal_id": "SJexqnAvC7",
      "sentence_index": 17,
      "text": "It indeed performs much worse than our full model.",
      "suffix": "\n\n",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_done",
      "alignment": [
        "context_sentences",
        [
          17,
          18,
          19,
          20
        ]
      ],
      "details": {
        "request_out_of_scope": true
      }
    },
    {
      "review_id": "r1eaIH8raQ",
      "rebuttal_id": "SJexqnAvC7",
      "sentence_index": 18,
      "text": "4. Ablation study for the importance of fusion",
      "suffix": "\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_sentences",
        [
          21,
          22,
          23
        ]
      ],
      "details": {}
    },
    {
      "review_id": "r1eaIH8raQ",
      "rebuttal_id": "SJexqnAvC7",
      "sentence_index": 19,
      "text": "We have added the result of this baseline (i.e., the \u201cours w/o fusion\u201d baseline), where we concatenate the state feature and the latent vector m^t together.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_done",
      "alignment": [
        "context_sentences",
        [
          21,
          22,
          23
        ]
      ],
      "details": {
        "request_out_of_scope": true
      }
    },
    {
      "review_id": "r1eaIH8raQ",
      "rebuttal_id": "SJexqnAvC7",
      "sentence_index": 20,
      "text": "The results have validated the importance of using the attention-based fusion layer.",
      "suffix": "\n\n",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_done",
      "alignment": [
        "context_sentences",
        [
          21,
          22,
          23
        ]
      ],
      "details": {
        "request_out_of_scope": true
      }
    },
    {
      "review_id": "r1eaIH8raQ",
      "rebuttal_id": "SJexqnAvC7",
      "sentence_index": 21,
      "text": "5. Generalizability argument",
      "suffix": "\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_sentences",
        [
          24,
          25,
          26,
          27,
          28
        ]
      ],
      "details": {}
    },
    {
      "review_id": "r1eaIH8raQ",
      "rebuttal_id": "SJexqnAvC7",
      "sentence_index": 22,
      "text": "Our main idea is to show as many configurations as possible to the learner by learning a good probing policy.",
      "suffix": "",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_mitigate-criticism",
      "alignment": [
        "context_sentences",
        [
          24,
          25,
          26,
          27,
          28
        ]
      ],
      "details": {}
    },
    {
      "review_id": "r1eaIH8raQ",
      "rebuttal_id": "SJexqnAvC7",
      "sentence_index": 23,
      "text": "Since the probing always starts from a single setting, there is indeed a limit in terms of how different the new settings could be.",
      "suffix": "",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_mitigate-criticism",
      "alignment": [
        "context_sentences",
        [
          24,
          25,
          26,
          27,
          28
        ]
      ],
      "details": {}
    },
    {
      "review_id": "r1eaIH8raQ",
      "rebuttal_id": "SJexqnAvC7",
      "sentence_index": 24,
      "text": "E.g., in Maze Navigation, it is impossible for the learner to change the room layout drastically in the time limit, so the learned policy won\u2019t make sense in a very different room layout (e.g., 8 rooms instead of 4 rooms).",
      "suffix": "",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_mitigate-criticism",
      "alignment": [
        "context_sentences",
        [
          24,
          25,
          26,
          27,
          28
        ]
      ],
      "details": {}
    },
    {
      "review_id": "r1eaIH8raQ",
      "rebuttal_id": "SJexqnAvC7",
      "sentence_index": 25,
      "text": "To obtain a better generalization, we may need to use a better imitation learning approach to replace the current one (behavioral cloning), and possibly using multiple starting configurations.",
      "suffix": "",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_mitigate-criticism",
      "alignment": [
        "context_sentences",
        [
          24,
          25,
          26,
          27,
          28
        ]
      ],
      "details": {}
    },
    {
      "review_id": "r1eaIH8raQ",
      "rebuttal_id": "SJexqnAvC7",
      "sentence_index": 26,
      "text": "But we think that it is somewhat orthogonal to our main contribution.",
      "suffix": "",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_mitigate-criticism",
      "alignment": [
        "context_sentences",
        [
          24,
          25,
          26,
          27,
          28
        ]
      ],
      "details": {}
    },
    {
      "review_id": "r1eaIH8raQ",
      "rebuttal_id": "SJexqnAvC7",
      "sentence_index": 27,
      "text": "The objective of our approach is to discover more diverse settings/configurations and consequently improve whatever imitation learning approach we actually use.",
      "suffix": "",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_mitigate-criticism",
      "alignment": [
        "context_sentences",
        [
          24,
          25,
          26,
          27,
          28
        ]
      ],
      "details": {}
    }
  ]
}