{
  "metadata": {
    "forum_id": "HkMlGnC9KQ",
    "review_id": "S1x3WD8ChQ",
    "rebuttal_id": "Skx8HkyGT7",
    "title": "On Regularization and Robustness of Deep Neural Networks",
    "reviewer": "AnonReviewer1",
    "rating": 5,
    "conference": "ICLR2019",
    "permalink": "https://openreview.net/forum?id=HkMlGnC9KQ&noteId=Skx8HkyGT7",
    "annotator": "anno10"
  },
  "review_sentences": [
    {
      "review_id": "S1x3WD8ChQ",
      "sentence_index": 0,
      "text": "Regularizing RKHS norm is a classic way to prevent overfitting.",
      "suffix": "",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "S1x3WD8ChQ",
      "sentence_index": 1,
      "text": "The authors",
      "suffix": "\n",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "S1x3WD8ChQ",
      "sentence_index": 2,
      "text": "note the connections between RKHS norm and several common regularization and",
      "suffix": "\n",
      "review_action": "none",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "S1x3WD8ChQ",
      "sentence_index": 3,
      "text": "robustness enhancement techniques, including gradient penalty, robust",
      "suffix": "\n",
      "review_action": "none",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "S1x3WD8ChQ",
      "sentence_index": 4,
      "text": "optimization via PGD and spectral norm normalization.",
      "suffix": "",
      "review_action": "none",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "S1x3WD8ChQ",
      "sentence_index": 5,
      "text": "They can be seen as upper",
      "suffix": "\n",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "S1x3WD8ChQ",
      "sentence_index": 6,
      "text": "or lower bounds of the RKHS norm.",
      "suffix": "\n\n",
      "review_action": "none",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "S1x3WD8ChQ",
      "sentence_index": 7,
      "text": "There are some interesting findings in the experiments. For example, for",
      "suffix": "\n",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_substance",
      "polarity": "pol_positive"
    },
    {
      "review_id": "S1x3WD8ChQ",
      "sentence_index": 8,
      "text": "improving generalization, using the gradient penalty based method seems to work",
      "suffix": "\n",
      "review_action": "none",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "S1x3WD8ChQ",
      "sentence_index": 9,
      "text": "best.",
      "suffix": "",
      "review_action": "none",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "S1x3WD8ChQ",
      "sentence_index": 10,
      "text": "For improving robustness, adversarial training with PGD has the best",
      "suffix": "\n",
      "review_action": "arg_fact",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "S1x3WD8ChQ",
      "sentence_index": 11,
      "text": "results (which matches the conclusions by Madry et al.); but as shown in Figure",
      "suffix": "\n",
      "review_action": "none",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "S1x3WD8ChQ",
      "sentence_index": 12,
      "text": "2,",
      "suffix": "",
      "review_action": "none",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "S1x3WD8ChQ",
      "sentence_index": 13,
      "text": "because adversarial training only decreases a lower bound of RKHS norm, it",
      "suffix": "\n",
      "review_action": "none",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "S1x3WD8ChQ",
      "sentence_index": 14,
      "text": "does not necessarily decrease the upper bound (the product of spectral norms).",
      "suffix": "\n",
      "review_action": "none",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "S1x3WD8ChQ",
      "sentence_index": 15,
      "text": "This can be shown as a weakness of adversarial training if the authors explore",
      "suffix": "\n",
      "review_action": "arg_fact",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "S1x3WD8ChQ",
      "sentence_index": 16,
      "text": "further and deeper in this direction.",
      "suffix": "\n\n",
      "review_action": "none",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "S1x3WD8ChQ",
      "sentence_index": 17,
      "text": "Overall, this paper has many interesting results, but its contribution is",
      "suffix": "\n",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_originality",
      "polarity": "pol_negative"
    },
    {
      "review_id": "S1x3WD8ChQ",
      "sentence_index": 18,
      "text": "limited because:",
      "suffix": "\n\n",
      "review_action": "none",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "S1x3WD8ChQ",
      "sentence_index": 19,
      "text": "1. The regularization techniques in reproducing kernel Hilbert space (RKHS) has",
      "suffix": "\n",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_originality",
      "polarity": "pol_negative"
    },
    {
      "review_id": "S1x3WD8ChQ",
      "sentence_index": 20,
      "text": "been well studied by previous literature.",
      "suffix": "",
      "review_action": "none",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "S1x3WD8ChQ",
      "sentence_index": 21,
      "text": "This paper simply applies these",
      "suffix": "\n",
      "review_action": "none",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "S1x3WD8ChQ",
      "sentence_index": 22,
      "text": "results to deep neural networks, by treating the neural network as a big",
      "suffix": "\n",
      "review_action": "none",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "S1x3WD8ChQ",
      "sentence_index": 23,
      "text": "black-box function f(x)",
      "suffix": "",
      "review_action": "none",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "S1x3WD8ChQ",
      "sentence_index": 24,
      "text": ".",
      "suffix": "",
      "review_action": "none",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "S1x3WD8ChQ",
      "sentence_index": 25,
      "text": "Many of the results have been already presented in",
      "suffix": "\n",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_originality",
      "polarity": "pol_negative"
    },
    {
      "review_id": "S1x3WD8ChQ",
      "sentence_index": 26,
      "text": "previous works like Bietti & Mairal (2018).",
      "suffix": "\n\n",
      "review_action": "none",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "S1x3WD8ChQ",
      "sentence_index": 27,
      "text": "2. In experiments, the authors explored many existing methods on improving",
      "suffix": "\n",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_originality",
      "polarity": "pol_negative"
    },
    {
      "review_id": "S1x3WD8ChQ",
      "sentence_index": 28,
      "text": "generalization and robustness. However all these methods are known and not new.",
      "suffix": "\n",
      "review_action": "none",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "S1x3WD8ChQ",
      "sentence_index": 29,
      "text": "Ideally, the authors can go further and propose a new regularization method",
      "suffix": "\n",
      "review_action": "arg_other",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "S1x3WD8ChQ",
      "sentence_index": 30,
      "text": "based on the connection between neural networks and RKHS, and conduct",
      "suffix": "\n",
      "review_action": "none",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "S1x3WD8ChQ",
      "sentence_index": 31,
      "text": "experiments to show its effectiveness.",
      "suffix": "\n\n",
      "review_action": "none",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "S1x3WD8ChQ",
      "sentence_index": 32,
      "text": "The paper is overall well written, and the introductions to RKHS and each",
      "suffix": "\n",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_clarity",
      "polarity": "pol_positive"
    },
    {
      "review_id": "S1x3WD8ChQ",
      "sentence_index": 33,
      "text": "regularization techniques are very clear.",
      "suffix": "",
      "review_action": "none",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "S1x3WD8ChQ",
      "sentence_index": 34,
      "text": "The provided experiments also include",
      "suffix": "\n",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_substance",
      "polarity": "pol_positive"
    },
    {
      "review_id": "S1x3WD8ChQ",
      "sentence_index": 35,
      "text": "some interesting findings.",
      "suffix": "",
      "review_action": "none",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "S1x3WD8ChQ",
      "sentence_index": 36,
      "text": "My major concern is the lack of novel contributions",
      "suffix": "\n",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_originality",
      "polarity": "pol_negative"
    },
    {
      "review_id": "S1x3WD8ChQ",
      "sentence_index": 37,
      "text": "in this paper.",
      "suffix": "",
      "review_action": "none",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    }
  ],
  "rebuttal_sentences": [
    {
      "review_id": "S1x3WD8ChQ",
      "rebuttal_id": "Skx8HkyGT7",
      "sentence_index": 0,
      "text": "We thank the reviewer for his comments.",
      "suffix": "",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_social",
      "alignment": [
        "context_global",
        null
      ],
      "details": {}
    },
    {
      "review_id": "S1x3WD8ChQ",
      "rebuttal_id": "Skx8HkyGT7",
      "sentence_index": 1,
      "text": "We address the comments about novelty in our general response ( https://openreview.net/forum?id=HkMlGnC9KQ&noteId=S1eid00WaQ ), for instance concerning the relationship to previous work, and the regularization penalty ||f||_M we propose.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          17,
          18,
          19,
          20,
          21,
          22,
          23,
          25,
          26,
          27,
          28,
          29,
          30,
          31,
          36
        ]
      ],
      "details": {}
    },
    {
      "review_id": "S1x3WD8ChQ",
      "rebuttal_id": "Skx8HkyGT7",
      "sentence_index": 2,
      "text": "More detailed comments are addressed below.",
      "suffix": "\n\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_sentences",
        [
          17,
          18,
          19,
          20,
          21,
          22,
          23,
          25,
          26,
          27,
          28,
          29,
          30,
          31,
          36
        ]
      ],
      "details": {}
    },
    {
      "review_id": "S1x3WD8ChQ",
      "rebuttal_id": "Skx8HkyGT7",
      "sentence_index": 3,
      "text": "**",
      "suffix": "",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_in-rebuttal",
        null
      ],
      "details": {}
    },
    {
      "review_id": "S1x3WD8ChQ",
      "rebuttal_id": "Skx8HkyGT7",
      "sentence_index": 4,
      "text": "weakness of adversarial training",
      "suffix": "\n\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_sentences",
        [
          10,
          11,
          13,
          14,
          15,
          16
        ]
      ],
      "details": {}
    },
    {
      "review_id": "S1x3WD8ChQ",
      "rebuttal_id": "Skx8HkyGT7",
      "sentence_index": 5,
      "text": "As noted in our general response, our ||f||_M regularization approach empirically yields models with a more useful certified generalization guarantee in the presence of adversaries on Cifar10, while PGD adversarial training would likely require local verification of robustness around each test example, and we are not aware of useful guarantees on adversarial generalization for such models.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          10,
          11,
          13,
          14,
          15,
          16
        ]
      ],
      "details": {}
    },
    {
      "review_id": "S1x3WD8ChQ",
      "rebuttal_id": "Skx8HkyGT7",
      "sentence_index": 6,
      "text": "We agree that this aspect is not clear in the current submission, and we will improve it in the next version.",
      "suffix": "\n\n",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_future",
      "alignment": [
        "context_sentences",
        [
          10,
          11,
          13,
          14,
          15,
          16
        ]
      ],
      "details": {}
    },
    {
      "review_id": "S1x3WD8ChQ",
      "rebuttal_id": "Skx8HkyGT7",
      "sentence_index": 7,
      "text": "*",
      "suffix": "",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_in-rebuttal",
        null
      ],
      "details": {}
    },
    {
      "review_id": "S1x3WD8ChQ",
      "rebuttal_id": "Skx8HkyGT7",
      "sentence_index": 8,
      "text": "* relationship with traditional RKHS regularization",
      "suffix": "\n\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_sentences",
        [
          29,
          30,
          31
        ]
      ],
      "details": {}
    },
    {
      "review_id": "S1x3WD8ChQ",
      "rebuttal_id": "Skx8HkyGT7",
      "sentence_index": 9,
      "text": "There is indeed no question that kernel methods/RKHSs have been widely used for regularization of non-linear functions, for over 20 years now, however these methods typically rely on solving convex optimization problems using the kernel trick, or various kernel approximations (such as random Fourier features).",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          29,
          30,
          31
        ]
      ],
      "details": {}
    },
    {
      "review_id": "S1x3WD8ChQ",
      "rebuttal_id": "Skx8HkyGT7",
      "sentence_index": 10,
      "text": "Separately, defining RKHSs that contain neural networks has indeed been the study of previous work, such as Bietti and Mairal (2018) or Zhang et al. (2016; 2017), however these only study theoretical properties of the kernel mapping and the RKHS norm, or derive convex learning procedures to replace training neural networks.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          29,
          30,
          31
        ]
      ],
      "details": {}
    },
    {
      "review_id": "S1x3WD8ChQ",
      "rebuttal_id": "Skx8HkyGT7",
      "sentence_index": 11,
      "text": "Our approach is quite different, in that we leverage these insights to obtain practical regularization strategies for generic neural networks.",
      "suffix": "\n\n",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          29,
          30,
          31
        ]
      ],
      "details": {}
    },
    {
      "review_id": "S1x3WD8ChQ",
      "rebuttal_id": "Skx8HkyGT7",
      "sentence_index": 12,
      "text": "** new regularization methods",
      "suffix": "\n\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_sentences",
        [
          29,
          30,
          31
        ]
      ],
      "details": {}
    },
    {
      "review_id": "S1x3WD8ChQ",
      "rebuttal_id": "Skx8HkyGT7",
      "sentence_index": 13,
      "text": "In addition to the ||f||_M lower bound penalty discussed in our general response, we note that combined approaches based on lower bound + upper bound methods are also novel to the best of our knowledge, and in particular we found combining robust optimization techniques with spectral norm constraints to be quite successful in many of the small data scenarios considered (see Table 1).",
      "suffix": "\n\n",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          29,
          30,
          31
        ]
      ],
      "details": {}
    },
    {
      "review_id": "S1x3WD8ChQ",
      "rebuttal_id": "Skx8HkyGT7",
      "sentence_index": 14,
      "text": "We will happily clarify some of these points in an updated version of the paper.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_by-cr",
      "alignment": [
        "context_sentences",
        [
          29,
          30,
          31
        ]
      ],
      "details": {
        "manuscript_change": true
      }
    }
  ]
}