{
  "metadata": {
    "forum_id": "B1gXWCVtvr",
    "review_id": "BJl2JOl9cr",
    "rebuttal_id": "HyxeVQ5hjH",
    "title": "Adapting Behaviour for Learning Progress",
    "reviewer": "AnonReviewer4",
    "rating": 3,
    "conference": "ICLR2020",
    "permalink": "https://openreview.net/forum?id=B1gXWCVtvr&noteId=HyxeVQ5hjH",
    "annotator": "anno3"
  },
  "review_sentences": [
    {
      "review_id": "BJl2JOl9cr",
      "sentence_index": 0,
      "text": "This in an interesting paper as it tries to alleviate the burden of hyper-parameters tuning for exploration strategies Deep Reinforcement learning.",
      "suffix": "\n",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "BJl2JOl9cr",
      "sentence_index": 1,
      "text": "The paper proposes an adaptive behaviour in order to shape the data generation process for effective learning.",
      "suffix": "",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "BJl2JOl9cr",
      "sentence_index": 2,
      "text": "The paper considers a behaviour policy that is parametrized by a set of variables z called modulations: for example the Boltzmann softmax temperature, the probability epsilon for epsilon-greedy, per-action biases, ..",
      "suffix": "\n",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "BJl2JOl9cr",
      "sentence_index": 3,
      "text": "The author frame the modulations search into a non-stationary multi-armed bandit problem and proposes to adapt the modulations according to a proxy to the learning progress.",
      "suffix": "",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "BJl2JOl9cr",
      "sentence_index": 4,
      "text": "The author provides thorough experimental results.",
      "suffix": "\n\n",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_substance",
      "polarity": "pol_positive"
    },
    {
      "review_id": "BJl2JOl9cr",
      "sentence_index": 5,
      "text": "Comments:",
      "suffix": "\n\n",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_heading",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "BJl2JOl9cr",
      "sentence_index": 6,
      "text": "- All the variations considered for the behaviour policy performs only myopic exploration and thus provably inefficient in RL.",
      "suffix": "\n",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_substance",
      "polarity": "pol_negative"
    },
    {
      "review_id": "BJl2JOl9cr",
      "sentence_index": 7,
      "text": "- The proposed proxy is simply the empirical episodic return.",
      "suffix": "",
      "review_action": "arg_fact",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "BJl2JOl9cr",
      "sentence_index": 8,
      "text": "It is not well explained in the paper how this proxy correlates with the Learning progress criteria.",
      "suffix": "\n",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_soundness-correctness",
      "polarity": "pol_negative"
    },
    {
      "review_id": "BJl2JOl9cr",
      "sentence_index": 9,
      "text": "- The proxy seems to encourage selecting modulations that lead to generate most rewarding trajectories.",
      "suffix": "",
      "review_action": "arg_fact",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "BJl2JOl9cr",
      "sentence_index": 10,
      "text": "How this proxy incentives the agent to explore poorly-understood regions?",
      "suffix": "",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_explanation",
      "aspect": "asp_substance",
      "polarity": "pol_negative"
    },
    {
      "review_id": "BJl2JOl9cr",
      "sentence_index": 11,
      "text": "In other terms, how this proxy help to tradeoff between exploration and exploitation ?",
      "suffix": "\n",
      "review_action": "none",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "BJl2JOl9cr",
      "sentence_index": 12,
      "text": "-  The modulation adaptation problem is framed into non-stationary multi-armed bandit problem but the authors present a heuristic to solve it instead of using provably efficient bandit algorithm such as exponential weight methods (Besbes et al 2014) or Thompson sampling (Raj & Kalyani 2017) cited in the paper.",
      "suffix": "\n",
      "review_action": "arg_fact",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "BJl2JOl9cr",
      "sentence_index": 13,
      "text": "- The way the authors adapt the modulation z (or at least its description in the paper) seems not technically sounded for me.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_soundness-correctness",
      "polarity": "pol_negative"
    },
    {
      "review_id": "BJl2JOl9cr",
      "sentence_index": 14,
      "text": "They estimate a certain probability at time step t by empirical frequency based on data from previous time steps.",
      "suffix": "",
      "review_action": "none",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "BJl2JOl9cr",
      "sentence_index": 15,
      "text": "But as the parameters change during the learning, the f_t\u2019(z) at time t\u2019 < t is not distributed as f_t(z).",
      "suffix": "",
      "review_action": "none",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "BJl2JOl9cr",
      "sentence_index": 16,
      "text": "This introduces a biases in the estimate.",
      "suffix": "\n",
      "review_action": "none",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "BJl2JOl9cr",
      "sentence_index": 17,
      "text": "- I appreciate the thorough empirical results and ablation studies in the main paper and the appendix. They are really interesting.",
      "suffix": "\n",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_motivation-impact",
      "polarity": "pol_positive"
    },
    {
      "review_id": "BJl2JOl9cr",
      "sentence_index": 18,
      "text": "- I am confused what is the fixed reference in Figure 6. It is not explained in the main paper.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_clarity",
      "polarity": "pol_negative"
    },
    {
      "review_id": "BJl2JOl9cr",
      "sentence_index": 19,
      "text": "Is it a baseline with the best hyperprameters in hindsight?",
      "suffix": "\n",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_explanation",
      "aspect": "asp_substance",
      "polarity": "none"
    },
    {
      "review_id": "BJl2JOl9cr",
      "sentence_index": 20,
      "text": "-  From the plots of learning curves in appendix, the proposed methods doesn\u2019t seem to show a huge boost of performance comparing to the uniform bandit. Could you show aggregated comparison between the proposed method and uniform bandit similarly to what is done in Figure 4 ?",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_substance",
      "polarity": "pol_negative"
    }
  ],
  "rebuttal_sentences": [
    {
      "review_id": "BJl2JOl9cr",
      "rebuttal_id": "HyxeVQ5hjH",
      "sentence_index": 0,
      "text": "> All the variations considered for the behaviour policy performs only myopic exploration and thus provably inefficient in RL.",
      "suffix": "\n\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_sentences",
        [
          6
        ]
      ],
      "details": {}
    },
    {
      "review_id": "BJl2JOl9cr",
      "rebuttal_id": "HyxeVQ5hjH",
      "sentence_index": 1,
      "text": "Yes, we experiment only with myopic variants of exploration, but (A) our approach is not limited to this initial set of behaviour modulations, and could be extended to trade off between intrinsic and extrinsic motivation, or between model-free and model-based mechanisms; and (B) the variations we consider may not be ideal, but they are the ones most commonly used in domains like Atari.",
      "suffix": "\n\n\n",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          6
        ]
      ],
      "details": {}
    },
    {
      "review_id": "BJl2JOl9cr",
      "rebuttal_id": "HyxeVQ5hjH",
      "sentence_index": 2,
      "text": "> The proposed proxy is simply the empirical episodic return.",
      "suffix": "",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_sentences",
        [
          7,
          8
        ]
      ],
      "details": {}
    },
    {
      "review_id": "BJl2JOl9cr",
      "rebuttal_id": "HyxeVQ5hjH",
      "sentence_index": 3,
      "text": "It is not well explained in the paper how this proxy correlates with the Learning progress criteria.",
      "suffix": "",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_sentences",
        [
          7,
          8
        ]
      ],
      "details": {}
    },
    {
      "review_id": "BJl2JOl9cr",
      "rebuttal_id": "HyxeVQ5hjH",
      "sentence_index": 4,
      "text": "The proxy seems to encourage selecting modulations that lead to generate most rewarding trajectories.",
      "suffix": "",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_sentences",
        [
          7,
          8,
          9
        ]
      ],
      "details": {}
    },
    {
      "review_id": "BJl2JOl9cr",
      "rebuttal_id": "HyxeVQ5hjH",
      "sentence_index": 5,
      "text": "How this proxy incentives the agent to explore poorly-understood regions?",
      "suffix": "",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_sentences",
        [
          7,
          8,
          9,
          10
        ]
      ],
      "details": {}
    },
    {
      "review_id": "BJl2JOl9cr",
      "rebuttal_id": "HyxeVQ5hjH",
      "sentence_index": 6,
      "text": "In other terms, how this proxy help to tradeoff between exploration and exploitation ?",
      "suffix": "\n\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_sentences",
        [
          7,
          8,
          9,
          10,
          11
        ]
      ],
      "details": {}
    },
    {
      "review_id": "BJl2JOl9cr",
      "rebuttal_id": "HyxeVQ5hjH",
      "sentence_index": 7,
      "text": "Thank you for this suggestion, we have now clarified this connection in Section 3.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_done",
      "alignment": [
        "context_sentences",
        [
          7,
          8,
          9,
          10,
          11
        ]
      ],
      "details": {
        "request_out_of_scope": true
      }
    },
    {
      "review_id": "BJl2JOl9cr",
      "rebuttal_id": "HyxeVQ5hjH",
      "sentence_index": 8,
      "text": "We acknowledge that f departs from LP in a number of ways.",
      "suffix": "\n",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_concede-criticism",
      "alignment": [
        "context_sentences",
        [
          7,
          8,
          9,
          10,
          11
        ]
      ],
      "details": {}
    },
    {
      "review_id": "BJl2JOl9cr",
      "rebuttal_id": "HyxeVQ5hjH",
      "sentence_index": 9,
      "text": "First, it does not contain learner-subjective information, but this is partly mitigated through the joint use of with prioritised replay that over-samples high error experience.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          7,
          8,
          9,
          10,
          11
        ]
      ],
      "details": {}
    },
    {
      "review_id": "BJl2JOl9cr",
      "rebuttal_id": "HyxeVQ5hjH",
      "sentence_index": 10,
      "text": "Another potential mechanism by which the episodic return can be indicative of future learning is because an improved policy tends to be preceded by some higher-return episodes -- in general, there is a lag between best-seen performance and reliably reproducing it.",
      "suffix": "\n",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          7,
          8,
          9,
          10,
          11
        ]
      ],
      "details": {}
    },
    {
      "review_id": "BJl2JOl9cr",
      "rebuttal_id": "HyxeVQ5hjH",
      "sentence_index": 11,
      "text": "Second, the fitness is based on absolute returns not differences in returns as suggested by Equation 1; this makes no difference to the relative orderings of z (and the resulting probabilities induced by the bandit), but it has the benefit that the non-stationarity takes a different form: a difference-based metric will appear stationary if the policy performance keeps increasing at a steady rate, but such a policy must be changing significantly to achieve that progress, and therefore the selection mechanism should keep revisiting other modulations.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          7,
          8,
          9,
          10,
          11
        ]
      ],
      "details": {}
    },
    {
      "review_id": "BJl2JOl9cr",
      "rebuttal_id": "HyxeVQ5hjH",
      "sentence_index": 12,
      "text": "In contrast, our absolute fitness naturally has this effect when paired with a non-stationary bandit.",
      "suffix": "\n",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          7,
          8,
          9,
          10,
          11
        ]
      ],
      "details": {}
    },
    {
      "review_id": "BJl2JOl9cr",
      "rebuttal_id": "HyxeVQ5hjH",
      "sentence_index": 13,
      "text": "We have also updated the paper to highlight that our proposed proxy is to be understood as an initial, simple, working instance, with a lot of remaining future work that could extend and refine it.",
      "suffix": "\n\n\n",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_by-cr",
      "alignment": [
        "context_sentences",
        [
          7,
          8,
          9,
          10,
          11
        ]
      ],
      "details": {
        "manuscript_change": true
      }
    },
    {
      "review_id": "BJl2JOl9cr",
      "rebuttal_id": "HyxeVQ5hjH",
      "sentence_index": 14,
      "text": ">",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_by-cr",
      "alignment": [
        "context_sentences",
        [
          7,
          8,
          9,
          10,
          11
        ]
      ],
      "details": {
        "manuscript_change": true
      }
    },
    {
      "review_id": "BJl2JOl9cr",
      "rebuttal_id": "HyxeVQ5hjH",
      "sentence_index": 15,
      "text": "The modulation adaptation problem is framed into non-stationary multi-armed bandit problem but the authors present a heuristic to solve it instead of using provably efficient bandit algorithm such as [...]",
      "suffix": "\n\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_sentences",
        [
          12
        ]
      ],
      "details": {}
    },
    {
      "review_id": "BJl2JOl9cr",
      "rebuttal_id": "HyxeVQ5hjH",
      "sentence_index": 16,
      "text": "Thank you for the suggestion!",
      "suffix": "",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_social",
      "alignment": [
        "context_sentences",
        [
          12
        ]
      ],
      "details": {}
    },
    {
      "review_id": "BJl2JOl9cr",
      "rebuttal_id": "HyxeVQ5hjH",
      "sentence_index": 17,
      "text": "We had experimented with a few of these variants before designing the proposed adaptation method.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          12
        ]
      ],
      "details": {}
    },
    {
      "review_id": "BJl2JOl9cr",
      "rebuttal_id": "HyxeVQ5hjH",
      "sentence_index": 18,
      "text": "We have now included such a plot in the paper, comparing our method to UBC and Thompson sampling (Appendix E.3 and Figure 16).",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_done",
      "alignment": [
        "context_sentences",
        [
          12
        ]
      ],
      "details": {
        "request_out_of_scope": true
      }
    },
    {
      "review_id": "BJl2JOl9cr",
      "rebuttal_id": "HyxeVQ5hjH",
      "sentence_index": 19,
      "text": "As you can see from this comparison, the performance of these well-known bandits depends on the game, and it is subject to tuning, which is what we wanted to avoid in the first place.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          12
        ]
      ],
      "details": {}
    },
    {
      "review_id": "BJl2JOl9cr",
      "rebuttal_id": "HyxeVQ5hjH",
      "sentence_index": 20,
      "text": "In most games our bandit performs significantly better.",
      "suffix": "\n\n\n",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          12
        ]
      ],
      "details": {}
    },
    {
      "review_id": "BJl2JOl9cr",
      "rebuttal_id": "HyxeVQ5hjH",
      "sentence_index": 21,
      "text": "> The way the authors adapt the modulation z (or at least its description in the paper) seems not technically sounded for me",
      "suffix": "",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_sentences",
        [
          13
        ]
      ],
      "details": {}
    },
    {
      "review_id": "BJl2JOl9cr",
      "rebuttal_id": "HyxeVQ5hjH",
      "sentence_index": 22,
      "text": "[...]",
      "suffix": "\n\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_sentences",
        [
          13,
          14,
          15,
          16
        ]
      ],
      "details": {}
    },
    {
      "review_id": "BJl2JOl9cr",
      "rebuttal_id": "HyxeVQ5hjH",
      "sentence_index": 23,
      "text": "The distribution of f(z) does change as a function of the parameter change and thus as a function of time.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          13,
          14,
          15,
          16
        ]
      ],
      "details": {}
    },
    {
      "review_id": "BJl2JOl9cr",
      "rebuttal_id": "HyxeVQ5hjH",
      "sentence_index": 24,
      "text": "This is precisely the kind of non-stationarity that our adaptive mechanism has to deal",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          13,
          14,
          15,
          16
        ]
      ],
      "details": {}
    },
    {
      "review_id": "BJl2JOl9cr",
      "rebuttal_id": "HyxeVQ5hjH",
      "sentence_index": 25,
      "text": "with",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          13,
          14,
          15,
          16
        ]
      ],
      "details": {}
    },
    {
      "review_id": "BJl2JOl9cr",
      "rebuttal_id": "HyxeVQ5hjH",
      "sentence_index": 26,
      "text": ".",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          13,
          14,
          15,
          16
        ]
      ],
      "details": {}
    },
    {
      "review_id": "BJl2JOl9cr",
      "rebuttal_id": "HyxeVQ5hjH",
      "sentence_index": 27,
      "text": "This is also the reason behind the adaptive window used in this work.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          13,
          14,
          15,
          16
        ]
      ],
      "details": {}
    },
    {
      "review_id": "BJl2JOl9cr",
      "rebuttal_id": "HyxeVQ5hjH",
      "sentence_index": 28,
      "text": "In a sense, one can see the size of the window as a proxy for the effective time horizon at which things can be seen as stationary in the learning.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          13,
          14,
          15,
          16
        ]
      ],
      "details": {}
    },
    {
      "review_id": "BJl2JOl9cr",
      "rebuttal_id": "HyxeVQ5hjH",
      "sentence_index": 29,
      "text": "The window over which we integrate evidence is chosen to make the best recommendation; thus every time we deviate too much from the sample distribution captured within it, we consider this as a sign of non-stationarity and shrink the window.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          13,
          14,
          15,
          16
        ]
      ],
      "details": {}
    },
    {
      "review_id": "BJl2JOl9cr",
      "rebuttal_id": "HyxeVQ5hjH",
      "sentence_index": 30,
      "text": "This is by no means optimal, nor do we claim it is, but it seems to be a reliable enough proxy to outperform candidates that do assume stationarity (as portrayed by the comparison in Figure 16).",
      "suffix": "\n\n\n",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          13,
          14,
          15,
          16
        ]
      ],
      "details": {}
    },
    {
      "review_id": "BJl2JOl9cr",
      "rebuttal_id": "HyxeVQ5hjH",
      "sentence_index": 31,
      "text": "> I am confused what is the fixed reference in Figure 6. It is not explained in the main paper.",
      "suffix": "",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_sentences",
        [
          18
        ]
      ],
      "details": {}
    },
    {
      "review_id": "BJl2JOl9cr",
      "rebuttal_id": "HyxeVQ5hjH",
      "sentence_index": 32,
      "text": "Is it a baseline with the best hyperprameters in hindsight?",
      "suffix": "\n\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_sentences",
        [
          18,
          19
        ]
      ],
      "details": {}
    },
    {
      "review_id": "BJl2JOl9cr",
      "rebuttal_id": "HyxeVQ5hjH",
      "sentence_index": 33,
      "text": "The \u201cfixed reference\u201d is described in Appendix C, and corresponds to the most commonly used settings in the literature.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          18,
          19
        ]
      ],
      "details": {}
    },
    {
      "review_id": "BJl2JOl9cr",
      "rebuttal_id": "HyxeVQ5hjH",
      "sentence_index": 34,
      "text": "We made this clear in the main body of the text.",
      "suffix": "\n\n\n",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_done",
      "alignment": [
        "context_sentences",
        [
          18,
          19
        ]
      ],
      "details": {
        "request_out_of_scope": true
      }
    },
    {
      "review_id": "BJl2JOl9cr",
      "rebuttal_id": "HyxeVQ5hjH",
      "sentence_index": 35,
      "text": ">",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_done",
      "alignment": [
        "context_sentences",
        [
          18,
          19
        ]
      ],
      "details": {
        "request_out_of_scope": true
      }
    },
    {
      "review_id": "BJl2JOl9cr",
      "rebuttal_id": "HyxeVQ5hjH",
      "sentence_index": 36,
      "text": "From the plots of learning curves in appendix, the proposed methods doesn\u2019t seem to show a huge boost of performance comparing to the uniform bandit. Could you show aggregated comparison between the proposed method and uniform bandit similarly to what is done in Figure 4 ?",
      "suffix": "\n\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_sentences",
        [
          20
        ]
      ],
      "details": {}
    },
    {
      "review_id": "BJl2JOl9cr",
      "rebuttal_id": "HyxeVQ5hjH",
      "sentence_index": 37,
      "text": "Yes, we show this in aggregate in Figure 6 (old Figure 5-right): it shows how the bandit is roughly on par with uniform when the modulation set is curated, but the bandit significantly outperforms uniform in the untuned (\u201cextended\u201d) setting.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          20
        ]
      ],
      "details": {}
    },
    {
      "review_id": "BJl2JOl9cr",
      "rebuttal_id": "HyxeVQ5hjH",
      "sentence_index": 38,
      "text": "We clarified the caption for this too.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_done",
      "alignment": [
        "context_sentences",
        [
          20
        ]
      ],
      "details": {
        "request_out_of_scope": true
      }
    }
  ]
}