{
  "metadata": {
    "forum_id": "HJfQrs0qt7",
    "review_id": "rJl4x6wqnX",
    "rebuttal_id": "H1xjF_PVCX",
    "title": "Convergence Properties of Deep Neural Networks on Separable Data",
    "reviewer": "AnonReviewer2",
    "rating": 5,
    "conference": "ICLR2019",
    "permalink": "https://openreview.net/forum?id=HJfQrs0qt7&noteId=H1xjF_PVCX",
    "annotator": "anno3"
  },
  "review_sentences": [
    {
      "review_id": "rJl4x6wqnX",
      "sentence_index": 0,
      "text": "The authors study the learning dynamics of deep neural networks, which is of fundamental importance but lacks understanding.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_motivation-impact",
      "polarity": "pol_positive"
    },
    {
      "review_id": "rJl4x6wqnX",
      "sentence_index": 1,
      "text": "The authors study several dynamics like activation independence, gradient starvation, which gives new insights.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_motivation-impact",
      "polarity": "pol_negative"
    },
    {
      "review_id": "rJl4x6wqnX",
      "sentence_index": 2,
      "text": "However, the assumption is too strong.",
      "suffix": "\n\n",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_soundness-correctness",
      "polarity": "pol_negative"
    },
    {
      "review_id": "rJl4x6wqnX",
      "sentence_index": 3,
      "text": "There are two main results in the paper:",
      "suffix": "\n",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_heading",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "rJl4x6wqnX",
      "sentence_index": 4,
      "text": "1) Through learning, the neurons activates of one class.",
      "suffix": "\n",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "rJl4x6wqnX",
      "sentence_index": 5,
      "text": "2) The classification error, with respect to the number of iterations of gradient descent, exhibits a sigmoidal shape.",
      "suffix": "\n\n",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "rJl4x6wqnX",
      "sentence_index": 6,
      "text": "However, there are two strong assumptions: 1. the two data are perfectly separable by linear classifier.",
      "suffix": "",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "rJl4x6wqnX",
      "sentence_index": 7,
      "text": "2.  H2 assumes \"at the beginning of training data points from different classes do not activate the same neurons\".",
      "suffix": "",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "rJl4x6wqnX",
      "sentence_index": 8,
      "text": "This is a very strong initial assumption, I am not sure how likely this assumption would be satisfied.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_substance",
      "polarity": "pol_negative"
    },
    {
      "review_id": "rJl4x6wqnX",
      "sentence_index": 9,
      "text": "It sounds to me this assumption implicitly suggests that the algorithm is already ALMOST CONVERGENT.",
      "suffix": "\n\n",
      "review_action": "arg_fact",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "rJl4x6wqnX",
      "sentence_index": 10,
      "text": "If this assumption cannot be weakened, I don't think the paper can be accepted.",
      "suffix": "",
      "review_action": "arg_social",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    }
  ],
  "rebuttal_sentences": [
    {
      "review_id": "rJl4x6wqnX",
      "rebuttal_id": "H1xjF_PVCX",
      "sentence_index": 0,
      "text": "Thank you for your review.",
      "suffix": "\n\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_social",
      "alignment": [
        "context_global",
        null
      ],
      "details": {}
    },
    {
      "review_id": "rJl4x6wqnX",
      "rebuttal_id": "H1xjF_PVCX",
      "sentence_index": 1,
      "text": "We agree that assumption (H2) is very restrictive and have added some results relaxing it in Section 3.4 in the latest version of the paper.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_concede-criticism",
      "alignment": [
        "context_sentences",
        [
          9,
          10
        ]
      ],
      "details": {}
    },
    {
      "review_id": "rJl4x6wqnX",
      "rebuttal_id": "H1xjF_PVCX",
      "sentence_index": 2,
      "text": "Please see the comment above entitled: \u201cRelaxing Assumption (H2)\u201d for more details.",
      "suffix": "\n",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_done",
      "alignment": [
        "context_sentences",
        [
          9,
          10
        ]
      ],
      "details": {
        "request_out_of_scope": false
      }
    },
    {
      "review_id": "rJl4x6wqnX",
      "rebuttal_id": "H1xjF_PVCX",
      "sentence_index": 3,
      "text": "However, it it worth pointing that even under Assumption (H2), learning does not necessarily converge.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          9,
          10
        ]
      ],
      "details": {}
    },
    {
      "review_id": "rJl4x6wqnX",
      "rebuttal_id": "H1xjF_PVCX",
      "sentence_index": 4,
      "text": "As shown in Fig 2. Left and Section 3.3, any initialization in the top left red region will fail to solve the problem.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          9,
          10
        ]
      ],
      "details": {}
    },
    {
      "review_id": "rJl4x6wqnX",
      "rebuttal_id": "H1xjF_PVCX",
      "sentence_index": 5,
      "text": "In that case, the confidence on the corresponding class will be 0.5 after a finite number of updates.",
      "suffix": "\n\n",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          9,
          10
        ]
      ],
      "details": {}
    },
    {
      "review_id": "rJl4x6wqnX",
      "rebuttal_id": "H1xjF_PVCX",
      "sentence_index": 6,
      "text": "As far as assumption (H1) is concerned, it is very classic in deep learning theory (see for instance [1,2,3]) and we have not been able to relax it.",
      "suffix": "\n",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          9,
          10
        ]
      ],
      "details": {}
    },
    {
      "review_id": "rJl4x6wqnX",
      "rebuttal_id": "H1xjF_PVCX",
      "sentence_index": 7,
      "text": "[1] T. Laurent and J. von Brecht. Deep linear networks with arbitrary loss: All local minima are global.",
      "suffix": "",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_other",
      "alignment": [
        "context_in-rebuttal",
        null
      ],
      "details": {}
    },
    {
      "review_id": "rJl4x6wqnX",
      "rebuttal_id": "H1xjF_PVCX",
      "sentence_index": 8,
      "text": "ICML 2018",
      "suffix": "\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_other",
      "alignment": [
        "context_in-rebuttal",
        null
      ],
      "details": {}
    },
    {
      "review_id": "rJl4x6wqnX",
      "rebuttal_id": "H1xjF_PVCX",
      "sentence_index": 9,
      "text": "[2] Z. Liao and R. Couillet. The dynamics of learning: A random matrix approach. ICML 2018.",
      "suffix": "\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_other",
      "alignment": [
        "context_in-rebuttal",
        null
      ],
      "details": {}
    },
    {
      "review_id": "rJl4x6wqnX",
      "rebuttal_id": "H1xjF_PVCX",
      "sentence_index": 10,
      "text": "[3] S. Arora et al. On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization. ICML 2018.",
      "suffix": "",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_other",
      "alignment": [
        "context_in-rebuttal",
        null
      ],
      "details": {}
    }
  ]
}