{
  "metadata": {
    "forum_id": "rkgv9oRqtQ",
    "review_id": "SJekKHZ93Q",
    "rebuttal_id": "ByxbnbuB0Q",
    "title": "Compound Density Networks",
    "reviewer": "AnonReviewer3",
    "rating": 4,
    "conference": "ICLR2019",
    "permalink": "https://openreview.net/forum?id=rkgv9oRqtQ&noteId=ByxbnbuB0Q",
    "annotator": "anno10"
  },
  "review_sentences": [
    {
      "review_id": "SJekKHZ93Q",
      "sentence_index": 0,
      "text": "In this work the authors propose an extension of mixture density networks to the continuous domain, named compound density networks.",
      "suffix": "",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "SJekKHZ93Q",
      "sentence_index": 1,
      "text": "Specifically the paper builds on top of the idea of the ensemble neural networks (NNs) and introduces a stochastic neural network for handling the mixing components.",
      "suffix": "",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "SJekKHZ93Q",
      "sentence_index": 2,
      "text": "The mixing distribution is also parameterised by a neural network.",
      "suffix": "",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "SJekKHZ93Q",
      "sentence_index": 3,
      "text": "The authors claim that the proposed model can result in better uncertainty estimates and the experiments attempt to demonstrate the benefits of the approach, especially in cases of having to deal with adversarial attacks.",
      "suffix": "\n\n",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "SJekKHZ93Q",
      "sentence_index": 4,
      "text": "The paper in general is well written and easy to follow.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_clarity",
      "polarity": "pol_positive"
    },
    {
      "review_id": "SJekKHZ93Q",
      "sentence_index": 5,
      "text": "I have some concerns regarding the presentation of the main objective and the lack of justification in certain parts of the methodology.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_substance",
      "polarity": "pol_negative"
    },
    {
      "review_id": "SJekKHZ93Q",
      "sentence_index": 6,
      "text": "Let me elaborate.",
      "suffix": "",
      "review_action": "arg_other",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "SJekKHZ93Q",
      "sentence_index": 7,
      "text": "First of all, I don\u2019t understand how the main equation of the compound density network in Equation (3) is different from the general case of a Bayesian neural network? Can the authors please comment on that?",
      "suffix": "\n\n",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_explanation",
      "aspect": "asp_replicability",
      "polarity": "pol_negative"
    },
    {
      "review_id": "SJekKHZ93Q",
      "sentence_index": 8,
      "text": "I also find weird the way that the authors arrive to their final objective in Equation (5).",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_replicability",
      "polarity": "pol_negative"
    },
    {
      "review_id": "SJekKHZ93Q",
      "sentence_index": 9,
      "text": "They start from Equation (4) which is incorrectly denoted as the log-marginal distribution while it is the same conditional distribution introduced in Equation (3) with the extra summation for all the available data points.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_soundness-correctness",
      "polarity": "pol_negative"
    },
    {
      "review_id": "SJekKHZ93Q",
      "sentence_index": 10,
      "text": "Then they continue to Equation (5) which they present as the combination of the true likelihood with a KL regularisation term.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_replicability",
      "polarity": "pol_negative"
    },
    {
      "review_id": "SJekKHZ93Q",
      "sentence_index": 11,
      "text": "However, what the authors implicitly did was to perform variational inference for maximising their likelihood by introducing a variational distribution q(\\theta) = p(\\theta | g(x_n; \\psi).",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_replicability",
      "polarity": "pol_negative"
    },
    {
      "review_id": "SJekKHZ93Q",
      "sentence_index": 12,
      "text": "Is there a reason why the authors do not introduce their objective by following the variational framework?",
      "suffix": "\n\n",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_explanation",
      "aspect": "asp_replicability",
      "polarity": "pol_negative"
    },
    {
      "review_id": "SJekKHZ93Q",
      "sentence_index": 13,
      "text": "Furthermore, in the beginning of Section 3.1 the authors present their idea on probabilistic hypernetoworks which \u201cmaps x to a distribution over parameters instead of specific value \\theta.\u201d How is this different from the case that we were considering so far? If we had a point estimate for \\theta we would not require to take an expectation in Equation (3) in the first place.",
      "suffix": "\n\n",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_soundness-correctness",
      "polarity": "pol_negative"
    },
    {
      "review_id": "SJekKHZ93Q",
      "sentence_index": 14,
      "text": "My biggest concern in the methodology, however, has to do with the selection of the matrix variate normal prior for the weights and the imposition of diagonal covariances (diag(a) and diag(b)).",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_replicability",
      "polarity": "pol_negative"
    },
    {
      "review_id": "SJekKHZ93Q",
      "sentence_index": 15,
      "text": "The Kronecker product between two diagonal matrices results in another diagonal matrix, i.e., diagonal covariance, which implies that the weights within a layer are given by an independent multivariate Gaussian.",
      "suffix": "",
      "review_action": "arg_fact",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "SJekKHZ93Q",
      "sentence_index": 16,
      "text": "What is the purpose then for introducing the matrix variate Gaussian?",
      "suffix": "",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_explanation",
      "aspect": "asp_replicability",
      "polarity": "pol_negative"
    },
    {
      "review_id": "SJekKHZ93Q",
      "sentence_index": 17,
      "text": "I would expect that you would like to impose additional structure to the weights.",
      "suffix": "",
      "review_action": "arg_other",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "SJekKHZ93Q",
      "sentence_index": 18,
      "text": "I expect the authors to comment on that.",
      "suffix": "\n\n",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_explanation",
      "aspect": "asp_replicability",
      "polarity": "pol_negative"
    },
    {
      "review_id": "SJekKHZ93Q",
      "sentence_index": 19,
      "text": "Regarding the experimental evaluation of the model rather confusing.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_substance",
      "polarity": "pol_negative"
    },
    {
      "review_id": "SJekKHZ93Q",
      "sentence_index": 20,
      "text": "The authors have proposed a model that due to the mixing is better suited for predictions with heteroscedastic noise and can better quantify the aleatoric uncertainty.",
      "suffix": "",
      "review_action": "arg_fact",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "SJekKHZ93Q",
      "sentence_index": 21,
      "text": "However, the selected experiments on the cubic regression toy data (Section 5.1) and the out-of-distribution classification (Section 5.2) are clear examples of system\u2019s noise, i.e. epistemic uncertainty.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_substance",
      "polarity": "pol_negative"
    },
    {
      "review_id": "SJekKHZ93Q",
      "sentence_index": 22,
      "text": "The generative process of the toy data clearly states that there is no heteroscedastic noise to handle.",
      "suffix": "",
      "review_action": "arg_fact",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "SJekKHZ93Q",
      "sentence_index": 23,
      "text": "The same applies for the notMNIST data which belong to a completely different data set compared to MNIST and thus out of sample prediction cannot benefit from the mixing; i.e., variations have to be explained by system\u2019s noise.",
      "suffix": "",
      "review_action": "arg_fact",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "SJekKHZ93Q",
      "sentence_index": 24,
      "text": "So overall I have the feeling that the authors have not succeeded to evaluate the model\u2019s power with these two experiments and we cannot draw any strong conclusions regarding the benefit of the proposed mixing approach.",
      "suffix": "\n\n",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_substance",
      "polarity": "pol_negative"
    },
    {
      "review_id": "SJekKHZ93Q",
      "sentence_index": 25,
      "text": "To continue with the experimental evaluation, I found the plots with the predictive uncertainty in Figure 3 a bit confusing.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_substance",
      "polarity": "pol_negative"
    },
    {
      "review_id": "SJekKHZ93Q",
      "sentence_index": 26,
      "text": "The plot by itself, as I understood, quantifies the model\u2019s uncertainty in in- and out-of sample prediction.",
      "suffix": "",
      "review_action": "arg_fact",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "SJekKHZ93Q",
      "sentence_index": 27,
      "text": "While I agree with the authors that it is generally desirable for a model to be more confident when predicting in MNIST (since it has already seen samples of it) compared to when predicting in notMNIST (completely different data), these plots tells us nothing regarding the predictive power of the model.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_substance",
      "polarity": "pol_negative"
    },
    {
      "review_id": "SJekKHZ93Q",
      "sentence_index": 28,
      "text": "There is no value in being very confident if you are wrong and vice-versa, so unless there is an accompanying plot/table reporting the accuracy I see not much value from this plot alone.",
      "suffix": "\n\n",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_substance",
      "polarity": "pol_negative"
    },
    {
      "review_id": "SJekKHZ93Q",
      "sentence_index": 29,
      "text": "Finally, it is unclear how the authors have picked the best \\lambda parameter for their approach? On page 5 they state that they \u201cpick the value that results in a good trade-off between high uncertainty estimates and high prediction accuracy.\u201d Does this mean that you get to observe the performance in the test in order to select the appropriate value for \\lambda? If this is the case this is completely undesirable and is considered a bad practice.",
      "suffix": "",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_explanation",
      "aspect": "asp_replicability",
      "polarity": "pol_negative"
    }
  ],
  "rebuttal_sentences": [
    {
      "review_id": "SJekKHZ93Q",
      "rebuttal_id": "ByxbnbuB0Q",
      "sentence_index": 0,
      "text": "We thank the reviewer for the valuable feedback! We reply to the answers and comments in the order they were raised.",
      "suffix": "",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_social",
      "alignment": [
        "context_global",
        null
      ],
      "details": {}
    },
    {
      "review_id": "SJekKHZ93Q",
      "rebuttal_id": "ByxbnbuB0Q",
      "sentence_index": 1,
      "text": "Note, that regarding the methodology there were some misunderstandings (which we try to avoid for future readers in the revised version).",
      "suffix": "\n\n",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_by-cr",
      "alignment": [
        "context_sentences",
        [
          5
        ]
      ],
      "details": {
        "manuscript_change": true
      }
    },
    {
      "review_id": "SJekKHZ93Q",
      "rebuttal_id": "ByxbnbuB0Q",
      "sentence_index": 2,
      "text": "(1) The equations are indeed very related.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          7
        ]
      ],
      "details": {}
    },
    {
      "review_id": "SJekKHZ93Q",
      "rebuttal_id": "ByxbnbuB0Q",
      "sentence_index": 3,
      "text": "Note however, that In a standard Bayesian neural network (BNN), one would assume that \\theta is a global random variable (i.e. does not depend on input x), whereas in the CDN, we assume that \\theta depends on x and is thus a local random variable.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          7
        ]
      ],
      "details": {}
    },
    {
      "review_id": "SJekKHZ93Q",
      "rebuttal_id": "ByxbnbuB0Q",
      "sentence_index": 4,
      "text": "Furthermore, in a Bayesian setting p(theta|...) would play the role of a approximate posterior, which would require variational inference (VI), and thus a different objective,  to estimate it.",
      "suffix": "\n\n",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          7
        ]
      ],
      "details": {}
    },
    {
      "review_id": "SJekKHZ93Q",
      "rebuttal_id": "ByxbnbuB0Q",
      "sentence_index": 5,
      "text": "(2) In Equation (4) we followed with  p(D | \\psi) a standard notation for \\sum_n p(y_n | x_n; \\psi) (i.e. summation of Equation (3) wrt all data in D) which also can be found e.g. in the work of Graves (2011) [4] and Blundell et al. (2015) [5].",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          8,
          9,
          10,
          11,
          12
        ]
      ],
      "details": {}
    },
    {
      "review_id": "SJekKHZ93Q",
      "rebuttal_id": "ByxbnbuB0Q",
      "sentence_index": 6,
      "text": "The objective we introduce for CDNs differs from the ELBO-based objective in VI in the way the logarithm is placed in the first term of the objective: in the ELBO we have a logarithm inside the expectation, while the logarithm is outside the expectation in the CDN objective (note however, that the sample-based approximations get equivalent if only one sample is used).",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          8,
          9,
          10,
          11,
          12
        ]
      ],
      "details": {}
    },
    {
      "review_id": "SJekKHZ93Q",
      "rebuttal_id": "ByxbnbuB0Q",
      "sentence_index": 7,
      "text": "Furthermore, in the ELBO we have a fixed value of \\lambda = 1.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          8,
          9,
          10,
          11,
          12
        ]
      ],
      "details": {}
    },
    {
      "review_id": "SJekKHZ93Q",
      "rebuttal_id": "ByxbnbuB0Q",
      "sentence_index": 8,
      "text": "We added a new Section 4 in the revised version of the paper discussing these differences.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_done",
      "alignment": [
        "context_sentences",
        [
          8,
          9,
          10,
          11,
          12
        ]
      ],
      "details": {
        "request_out_of_scope": true
      }
    },
    {
      "review_id": "SJekKHZ93Q",
      "rebuttal_id": "ByxbnbuB0Q",
      "sentence_index": 9,
      "text": "Moreover, we investigated the impact of the different objectives empirically and found that the CDN-based objective led to significantly better results, as shown in the newly added Section 6.4 in the revised manuscript.",
      "suffix": "\n\n",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_done",
      "alignment": [
        "context_sentences",
        [
          8,
          9,
          10,
          11,
          12
        ]
      ],
      "details": {
        "request_out_of_scope": true
      }
    },
    {
      "review_id": "SJekKHZ93Q",
      "rebuttal_id": "ByxbnbuB0Q",
      "sentence_index": 10,
      "text": "(3) Indeed we need the probabilistic version of hypernetworks to implement the model we described in Equation (3).",
      "suffix": "",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_reject-criticism",
      "alignment": [
        "context_sentences",
        [
          13
        ]
      ],
      "details": {}
    },
    {
      "review_id": "SJekKHZ93Q",
      "rebuttal_id": "ByxbnbuB0Q",
      "sentence_index": 11,
      "text": "We just wanted to point out that this is in contrast to the vanilla  hypernetworks proposed by Ha et al. (2016) [1] and Jia et al.",
      "suffix": "",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_reject-criticism",
      "alignment": [
        "context_sentences",
        [
          13
        ]
      ],
      "details": {}
    },
    {
      "review_id": "SJekKHZ93Q",
      "rebuttal_id": "ByxbnbuB0Q",
      "sentence_index": 12,
      "text": "(2016) [2] which would produce a point estimate for \\theta.",
      "suffix": "\n\n",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_reject-criticism",
      "alignment": [
        "context_sentences",
        [
          13
        ]
      ],
      "details": {}
    },
    {
      "review_id": "SJekKHZ93Q",
      "rebuttal_id": "ByxbnbuB0Q",
      "sentence_index": 13,
      "text": "(4) We used a matrix-variate normal (MVN) to reduce the parameters of the model.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          14,
          15,
          16,
          17,
          18
        ]
      ],
      "details": {}
    },
    {
      "review_id": "SJekKHZ93Q",
      "rebuttal_id": "ByxbnbuB0Q",
      "sentence_index": 14,
      "text": "Using a diagonal MVN for X \\in R^{p x q} one needs pq+p+q parameters.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          14,
          15,
          16,
          17,
          18
        ]
      ],
      "details": {}
    },
    {
      "review_id": "SJekKHZ93Q",
      "rebuttal_id": "ByxbnbuB0Q",
      "sentence_index": 15,
      "text": "In contrast a fully-factorized diagonal Gaussian needs pq+pq.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          14,
          15,
          16,
          17,
          18
        ]
      ],
      "details": {}
    },
    {
      "review_id": "SJekKHZ93Q",
      "rebuttal_id": "ByxbnbuB0Q",
      "sentence_index": 16,
      "text": "But you are right, we could easily extend our approach to account for more flexible distributions by using a \"diagonal plus rank-one\" structure diag(a)+uu^T, with vectors a and u, as noted by Louizos and Welling ( 2016) [3] (the increase of parameters is negligible: adding additional vector u).",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_concede-criticism",
      "alignment": [
        "context_sentences",
        [
          14,
          15,
          16,
          17,
          18
        ]
      ],
      "details": {}
    },
    {
      "review_id": "SJekKHZ93Q",
      "rebuttal_id": "ByxbnbuB0Q",
      "sentence_index": 17,
      "text": "We will investigate the benefits of more flexible mixing distributions in future work.",
      "suffix": "\n\n",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_future",
      "alignment": [
        "context_sentences",
        [
          14,
          15,
          16,
          17,
          18
        ]
      ],
      "details": {}
    },
    {
      "review_id": "SJekKHZ93Q",
      "rebuttal_id": "ByxbnbuB0Q",
      "sentence_index": 18,
      "text": "(5) Thanks for this valuable comment!",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_concede-criticism",
      "alignment": [
        "context_sentences",
        [
          19,
          20,
          21,
          22,
          23,
          24
        ]
      ],
      "details": {}
    },
    {
      "review_id": "SJekKHZ93Q",
      "rebuttal_id": "ByxbnbuB0Q",
      "sentence_index": 19,
      "text": "We have revised the toy experiments to include a heteroscedastic regression task (see Section 6.1.).",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_done",
      "alignment": [
        "context_sentences",
        [
          19,
          20,
          21,
          22,
          23,
          24
        ]
      ],
      "details": {
        "request_out_of_scope": true
      }
    },
    {
      "review_id": "SJekKHZ93Q",
      "rebuttal_id": "ByxbnbuB0Q",
      "sentence_index": 20,
      "text": "It shows that CDNs are able to quantify the heteroscedastic aleatoric uncertainty.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          19,
          20,
          21,
          22,
          23,
          24
        ]
      ],
      "details": {}
    },
    {
      "review_id": "SJekKHZ93Q",
      "rebuttal_id": "ByxbnbuB0Q",
      "sentence_index": 21,
      "text": "However, we kept the OOD experiments on MNIST and notMNIST in the paper as well, since we consider it as very interesting that the CDN, while being designed for modelling aleatoric uncertainty, is very competitive on this task.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          19,
          20,
          21,
          22,
          23,
          24
        ]
      ],
      "details": {}
    },
    {
      "review_id": "SJekKHZ93Q",
      "rebuttal_id": "ByxbnbuB0Q",
      "sentence_index": 22,
      "text": "Moreover, we investigate the mixing distribution learned in Appendix G.",
      "suffix": "\n\n",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          19,
          20,
          21,
          22,
          23,
          24
        ]
      ],
      "details": {}
    },
    {
      "review_id": "SJekKHZ93Q",
      "rebuttal_id": "ByxbnbuB0Q",
      "sentence_index": 23,
      "text": "(6) Of course, you are right! Previously we showed the test accuracy in Appendix F.  To make it more directly accessible, we have now added the test accuracy achieved by the different models into the legends of the plots, showing that CDN achieves similar predictive power as the baselines.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_concede-criticism",
      "alignment": [
        "context_sentences",
        [
          25,
          26,
          27,
          28
        ]
      ],
      "details": {}
    },
    {
      "review_id": "SJekKHZ93Q",
      "rebuttal_id": "ByxbnbuB0Q",
      "sentence_index": 24,
      "text": "We now present the validation accuracy instead in Appendix F.",
      "suffix": "\n\n",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_done",
      "alignment": [
        "context_sentences",
        [
          25,
          26,
          27,
          28
        ]
      ],
      "details": {
        "request_out_of_scope": true
      }
    },
    {
      "review_id": "SJekKHZ93Q",
      "rebuttal_id": "ByxbnbuB0Q",
      "sentence_index": 25,
      "text": "(7) Sorry, for this unfortunate formulation!",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_concede-criticism",
      "alignment": [
        "context_sentences",
        [
          29
        ]
      ],
      "details": {}
    },
    {
      "review_id": "SJekKHZ93Q",
      "rebuttal_id": "ByxbnbuB0Q",
      "sentence_index": 26,
      "text": "We observed that generally as \\lambda increases, the uncertainty is increasing, while the accuracy is decreasing.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          29
        ]
      ],
      "details": {}
    },
    {
      "review_id": "SJekKHZ93Q",
      "rebuttal_id": "ByxbnbuB0Q",
      "sentence_index": 27,
      "text": "Therefore a simple and effective heuristic for choosing \\lambda is to look at the validation set of MNIST and choose the highest \\lambda that still results in high accuracy (e.g. >. 0.97).",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          29
        ]
      ],
      "details": {}
    },
    {
      "review_id": "SJekKHZ93Q",
      "rebuttal_id": "ByxbnbuB0Q",
      "sentence_index": 28,
      "text": "We have made this procedure clear in the revised manuscript.",
      "suffix": "\n\n\n",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_done",
      "alignment": [
        "context_sentences",
        [
          29
        ]
      ],
      "details": {
        "request_out_of_scope": true
      }
    },
    {
      "review_id": "SJekKHZ93Q",
      "rebuttal_id": "ByxbnbuB0Q",
      "sentence_index": 29,
      "text": "References:",
      "suffix": "\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_in-rebuttal",
        null
      ],
      "details": {}
    },
    {
      "review_id": "SJekKHZ93Q",
      "rebuttal_id": "ByxbnbuB0Q",
      "sentence_index": 30,
      "text": "[1] Ha, David, Andrew Dai, and Quoc V. Le. \"Hypernetworks.\" arXiv preprint arXiv:1609.09106 (2016).",
      "suffix": "\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_other",
      "alignment": [
        "context_in-rebuttal",
        null
      ],
      "details": {}
    },
    {
      "review_id": "SJekKHZ93Q",
      "rebuttal_id": "ByxbnbuB0Q",
      "sentence_index": 31,
      "text": "[2] Jia, Xu, et al. \"Dynamic filter networks.\" Advances in Neural Information Processing Systems. 2016.",
      "suffix": "\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_other",
      "alignment": [
        "context_in-rebuttal",
        null
      ],
      "details": {}
    },
    {
      "review_id": "SJekKHZ93Q",
      "rebuttal_id": "ByxbnbuB0Q",
      "sentence_index": 32,
      "text": "[3] Louizos, Christos, and Max Welling. \"Structured and efficient variational deep learning with matrix gaussian posteriors.\" International Conference on Machine Learning. 2016.",
      "suffix": "\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_other",
      "alignment": [
        "context_in-rebuttal",
        null
      ],
      "details": {}
    },
    {
      "review_id": "SJekKHZ93Q",
      "rebuttal_id": "ByxbnbuB0Q",
      "sentence_index": 33,
      "text": "[4] Graves, A. (2011). Practical variational inference for neural networks. In Advances in neural information processing systems (pp. 2348-2356).",
      "suffix": "\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_other",
      "alignment": [
        "context_in-rebuttal",
        null
      ],
      "details": {}
    },
    {
      "review_id": "SJekKHZ93Q",
      "rebuttal_id": "ByxbnbuB0Q",
      "sentence_index": 34,
      "text": "[5] Blundell, C., Cornebise, J., Kavukcuoglu, K. & Wierstra, D.. (2015). Weight Uncertainty in Neural Network. Proceedings of the 32nd International Conference on Machine Learning, in PMLR 37:1613-1622",
      "suffix": "",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_other",
      "alignment": [
        "context_in-rebuttal",
        null
      ],
      "details": {}
    }
  ]
}