{
  "metadata": {
    "forum_id": "SyMDXnCcF7",
    "review_id": "BJxndjHwnm",
    "rebuttal_id": "SkxSjPODaQ",
    "title": "A Mean Field Theory of Batch Normalization",
    "reviewer": "AnonReviewer3",
    "rating": 7,
    "conference": "ICLR2019",
    "permalink": "https://openreview.net/forum?id=SyMDXnCcF7&noteId=SkxSjPODaQ",
    "annotator": "anno2"
  },
  "review_sentences": [
    {
      "review_id": "BJxndjHwnm",
      "sentence_index": 0,
      "text": "This paper provides a new dynamic perspective on deep neural network.",
      "suffix": "",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "BJxndjHwnm",
      "sentence_index": 1,
      "text": "Based on Gaussian weights and biases, the paper investigates the evolution of the covariance matrix along with the layers.",
      "suffix": "",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "BJxndjHwnm",
      "sentence_index": 2,
      "text": "Eventually the matrices achieve a stationary point, i.e., fixed point of the dynamic system.",
      "suffix": "",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "BJxndjHwnm",
      "sentence_index": 3,
      "text": "Local performance around the fixed point is explored.",
      "suffix": "",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "BJxndjHwnm",
      "sentence_index": 4,
      "text": "Extensions are provided to include the batch normalization.",
      "suffix": "",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "BJxndjHwnm",
      "sentence_index": 5,
      "text": "I believe this paper may stimulate some interesting ideas for other researchers.",
      "suffix": "\n\n",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_motivation-impact",
      "polarity": "pol_positive"
    },
    {
      "review_id": "BJxndjHwnm",
      "sentence_index": 6,
      "text": "Two technical questions:",
      "suffix": "\n\n",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_heading",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "BJxndjHwnm",
      "sentence_index": 7,
      "text": "1. When the layers tends to infinity, the covariance matrix reaches stationary (fixed) point.",
      "suffix": "",
      "review_action": "arg_fact",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "BJxndjHwnm",
      "sentence_index": 8,
      "text": "How to understand this phenomenon? Does this mean that the distribution of the layer outputs will not change too much if the layer is deep enough?",
      "suffix": "",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_explanation",
      "aspect": "asp_clarity",
      "polarity": "pol_negative"
    },
    {
      "review_id": "BJxndjHwnm",
      "sentence_index": 9,
      "text": "This somewhat conflicts the commonsense of \"the deeper the better?\"",
      "suffix": "\n\n",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_soundness-correctness",
      "polarity": "pol_negative"
    },
    {
      "review_id": "BJxndjHwnm",
      "sentence_index": 10,
      "text": "2. Typos: the weight matrix in the end of page 2 should be N_l times N_{l-1}. Also, the x_i's in the first line of page 3 should be bold.",
      "suffix": "",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_edit",
      "aspect": "asp_soundness-correctness",
      "polarity": "pol_negative"
    }
  ],
  "rebuttal_sentences": [
    {
      "review_id": "BJxndjHwnm",
      "rebuttal_id": "SkxSjPODaQ",
      "sentence_index": 0,
      "text": "Thank you for the review and careful reading of our paper! We\u2019re glad that you found it of interest. On revision we will fix the typos that you identified.",
      "suffix": "\n\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_social",
      "alignment": [
        "context_global",
        null
      ],
      "details": {}
    },
    {
      "review_id": "BJxndjHwnm",
      "rebuttal_id": "SkxSjPODaQ",
      "sentence_index": 1,
      "text": "Regarding the first point, your intuition is exactly correct and a slightly simpler discussion of this phenomenon can be found in [1].",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          7,
          8,
          9
        ]
      ],
      "details": {}
    },
    {
      "review_id": "BJxndjHwnm",
      "rebuttal_id": "SkxSjPODaQ",
      "sentence_index": 2,
      "text": "When the network is deep enough that the covariance matrix has reached its fixed point, the distribution of the outputs of the network will be independent of the inputs.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          7,
          8,
          9
        ]
      ],
      "details": {}
    },
    {
      "review_id": "BJxndjHwnm",
      "rebuttal_id": "SkxSjPODaQ",
      "sentence_index": 3,
      "text": "At this point the network becomes untrainable.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          7,
          8,
          9
        ]
      ],
      "details": {}
    },
    {
      "review_id": "BJxndjHwnm",
      "rebuttal_id": "SkxSjPODaQ",
      "sentence_index": 4,
      "text": "To reconcile this with the commonsense intuition that \u201cdeeper is better\u201d, our answer is twofold.",
      "suffix": "\n\n",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          7,
          8,
          9
        ]
      ],
      "details": {}
    },
    {
      "review_id": "BJxndjHwnm",
      "rebuttal_id": "SkxSjPODaQ",
      "sentence_index": 5,
      "text": "1) As in [1] and [2] it is often possible to find configurations or architectural modifications where the covariance matrix doesn\u2019t approach its fixed point over depths often considered in machine learning.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          7,
          8,
          9
        ]
      ],
      "details": {}
    },
    {
      "review_id": "BJxndjHwnm",
      "rebuttal_id": "SkxSjPODaQ",
      "sentence_index": 6,
      "text": "When this is the case one can safely increase the depth without sacrificing accuracy.",
      "suffix": "\n\n",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          7,
          8,
          9
        ]
      ],
      "details": {}
    },
    {
      "review_id": "BJxndjHwnm",
      "rebuttal_id": "SkxSjPODaQ",
      "sentence_index": 7,
      "text": "2) It seems that the role of depth in performance is more subtle than standard intuition would dictate.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          7,
          8,
          9
        ]
      ],
      "details": {}
    },
    {
      "review_id": "BJxndjHwnm",
      "rebuttal_id": "SkxSjPODaQ",
      "sentence_index": 8,
      "text": "For example, in [3] note that although the authors were able to train a 10k hidden layer network, they did not observe any improvement in accuracy.",
      "suffix": "\n\n",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          7,
          8,
          9
        ]
      ],
      "details": {}
    },
    {
      "review_id": "BJxndjHwnm",
      "rebuttal_id": "SkxSjPODaQ",
      "sentence_index": 9,
      "text": "In the next version of the manuscript (both in response to your review and that of referee 1) we will add a more intuitive discussion of these results which we agree are somewhat technical.",
      "suffix": "\n\n",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_by-cr",
      "alignment": [
        "context_sentences",
        [
          7,
          8,
          9
        ]
      ],
      "details": {
        "manuscript_change": true
      }
    },
    {
      "review_id": "BJxndjHwnm",
      "rebuttal_id": "SkxSjPODaQ",
      "sentence_index": 10,
      "text": "[1] S. S. Schoenholz, J. Gilmer, S. Ganguli, J. Sohl-Dickstein.",
      "suffix": "",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_other",
      "alignment": [
        "context_in-rebuttal",
        null
      ],
      "details": {}
    },
    {
      "review_id": "BJxndjHwnm",
      "rebuttal_id": "SkxSjPODaQ",
      "sentence_index": 11,
      "text": "Deep Information Propagation (https://arxiv.org/abs/1611.01232)",
      "suffix": "\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_other",
      "alignment": [
        "context_in-rebuttal",
        null
      ],
      "details": {}
    },
    {
      "review_id": "BJxndjHwnm",
      "rebuttal_id": "SkxSjPODaQ",
      "sentence_index": 12,
      "text": "[2] G. Yang and S. S. Schoenholz. Mean Field Residual Networks (https://arxiv.org/abs/1712.08969)",
      "suffix": "\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_other",
      "alignment": [
        "context_in-rebuttal",
        null
      ],
      "details": {}
    },
    {
      "review_id": "BJxndjHwnm",
      "rebuttal_id": "SkxSjPODaQ",
      "sentence_index": 13,
      "text": "[3] L. Xiao, Y. Bahri, J. Sohl-Dickstein, S. S. Schoenholz, J. Pennington. Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10,000-Layer Vanilla Convolutional Neural Networks (https://arxiv.org/abs/1806.05393)",
      "suffix": "",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_other",
      "alignment": [
        "context_in-rebuttal",
        null
      ],
      "details": {}
    }
  ]
}