{
  "metadata": {
    "forum_id": "B1lwSsC5KX",
    "review_id": "r1l7E6X22m",
    "rebuttal_id": "Hyls7CabCQ",
    "title": "D\u00e9j\u00e0 Vu: An Empirical Evaluation of the Memorization Properties of Convnets",
    "reviewer": "AnonReviewer3",
    "rating": 4,
    "conference": "ICLR2019",
    "permalink": "https://openreview.net/forum?id=B1lwSsC5KX&noteId=Hyls7CabCQ",
    "annotator": "anno3"
  },
  "review_sentences": [
    {
      "review_id": "r1l7E6X22m",
      "sentence_index": 0,
      "text": "==============Final Evaluation================",
      "suffix": "\n",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_heading",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "r1l7E6X22m",
      "sentence_index": 1,
      "text": "I have gone through the other reviews as well as the author response.",
      "suffix": "\n",
      "review_action": "arg_social",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "r1l7E6X22m",
      "sentence_index": 2,
      "text": "Firstly, I would like to thank the authors for providing detailed responses to my questions.",
      "suffix": "\n\n",
      "review_action": "arg_social",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "r1l7E6X22m",
      "sentence_index": 3,
      "text": "In general, I agree with R2 that the paper generally has some potentially interesting ideas and results but the manner in which the current draft is organized and presented makes it hard to grasp them and there is a lack of coherent message about what the paper is about.",
      "suffix": "\n\n",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_motivation-impact",
      "polarity": "pol_positive"
    },
    {
      "review_id": "r1l7E6X22m",
      "sentence_index": 4,
      "text": "Moreover, from my understanding the analysis in David McKay\u2019s book (Chapter 41) concerns a single neuron (and the number of parameters for a single neuron)",
      "suffix": "",
      "review_action": "arg_fact",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "r1l7E6X22m",
      "sentence_index": 5,
      "text": ".",
      "suffix": "",
      "review_action": "none",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "r1l7E6X22m",
      "sentence_index": 6,
      "text": "As pointed out by R2, with depth there are a lot more number of possible ways in which one could carve out decision boundaries to separate data points, thus, it is not clear that the loose linear upper bound holds Specifically, as one might expect with depth it could be possible that linear capacity increase is a lower bound (I am not suggesting that it is, but that possibility should be considered and explained in the paper).",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_clarity",
      "polarity": "pol_negative"
    },
    {
      "review_id": "r1l7E6X22m",
      "sentence_index": 7,
      "text": "Similarly, it would be good to formally connect the capacity to the rate of memorization before making a statement about them being related (as suggested in the initial review).",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_meaningful-comparison",
      "polarity": "pol_negative"
    },
    {
      "review_id": "r1l7E6X22m",
      "sentence_index": 8,
      "text": "In general, I feel this section could use some tighter formalism and justifications.",
      "suffix": "\n\n",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_substance",
      "polarity": "pol_negative"
    },
    {
      "review_id": "r1l7E6X22m",
      "sentence_index": 9,
      "text": "I also remain unconvinced by the response to my issue with the claim \u201cOur experiments show that our networks can remember a large number of images and distinguish them from unseen images\u201d, where the negative images are also seen by the memorization model, so they are not unseen.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_substance",
      "polarity": "pol_negative"
    },
    {
      "review_id": "r1l7E6X22m",
      "sentence_index": 10,
      "text": "The authors address this by saying 3M of the 15 M negatives have been seen.",
      "suffix": "",
      "review_action": "arg_fact",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "r1l7E6X22m",
      "sentence_index": 11,
      "text": "That does not seem like a small enough percentage to claim that these are \u201cunseen\u201d images.",
      "suffix": "\n\n",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_substance",
      "polarity": "pol_negative"
    },
    {
      "review_id": "r1l7E6X22m",
      "sentence_index": 12,
      "text": "In general, I feel the paper is interesting but would benefit from a major revision which makes the message of the paper more clear, and addresses these and other issues raised in the review phase. Thus I am holding my current rating.",
      "suffix": "\n",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_clarity",
      "polarity": "pol_negative"
    },
    {
      "review_id": "r1l7E6X22m",
      "sentence_index": 13,
      "text": "==================",
      "suffix": "\n\n",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_heading",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "r1l7E6X22m",
      "sentence_index": 14,
      "text": "Summary",
      "suffix": "\n",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_heading",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "r1l7E6X22m",
      "sentence_index": 15,
      "text": "The paper trains classification models to classify a labeling of a subset of images (assigned with label 1) from the rest of the images (assigned with a label 0).",
      "suffix": "",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "r1l7E6X22m",
      "sentence_index": 16,
      "text": "Firstly, the paper shows that deep learning models are able to learn such classifiers and get low training loss.",
      "suffix": "",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "r1l7E6X22m",
      "sentence_index": 17,
      "text": "It then proposes to use this model to ``attack\u2019\u2019 task-specific models to perform membership inference, i.e. figuring out if an image provided in a set was used in training or not.",
      "suffix": "\n\n",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "r1l7E6X22m",
      "sentence_index": 18,
      "text": "Strengths",
      "suffix": "\n",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_heading",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "r1l7E6X22m",
      "sentence_index": 19,
      "text": "+ The paper thoroughly covers related work and provides context.",
      "suffix": "\n",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_substance",
      "polarity": "pol_positive"
    },
    {
      "review_id": "r1l7E6X22m",
      "sentence_index": 20,
      "text": "+ Results on confidence as a signature of a dataset are interesting.",
      "suffix": "\n\n",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_motivation-impact",
      "polarity": "pol_positive"
    },
    {
      "review_id": "r1l7E6X22m",
      "sentence_index": 21,
      "text": "Weaknesses",
      "suffix": "\n\n",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_heading",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "r1l7E6X22m",
      "sentence_index": 22,
      "text": "[Motivation]",
      "suffix": "\n",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_heading",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "r1l7E6X22m",
      "sentence_index": 23,
      "text": "1.",
      "suffix": "",
      "review_action": "none",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "r1l7E6X22m",
      "sentence_index": 24,
      "text": "In general, recent work has found that the raw number of parameters has little to do with the size of the model class or the capacity of a model for deep models, and thus work like [A] has been trying to come up with better complexity measures for models to explain generalization.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_motivation-impact",
      "polarity": "pol_negative"
    },
    {
      "review_id": "r1l7E6X22m",
      "sentence_index": 25,
      "text": "Thus, without sufficient justification the assertion in the paper that the capacity of the network is well approximated by the number of parameters does not seem correct.",
      "suffix": "",
      "review_action": "none",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "r1l7E6X22m",
      "sentence_index": 26,
      "text": "Also, the claim in Fig. 1 that the transition from \u2018\u2019high capacity\u2019\u2019 to low capacity happens at the number of parameters in the network seems a bit loose and hard to substantiate from what I understand, and should be toned down. (*)",
      "suffix": "\n\n",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_substance",
      "polarity": "pol_negative"
    },
    {
      "review_id": "r1l7E6X22m",
      "sentence_index": 27,
      "text": "[Capacity]",
      "suffix": "\n",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_heading",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "r1l7E6X22m",
      "sentence_index": 28,
      "text": "2. Sec. 3.3, Fig. 3: The capacity (in terms of parameters)of both Resnet-18 and VGG-16 is higher than the capcity for YFCC100M dataset for n=10K images (comes to 161K bits), while the capacity of Resnet-18, with 14.7 million parameters (assuming float32 encoding) has 14.7 * 32 bits = 470.4 million bits, thus capacity alone cannot explain why VGG converges faster than Resnet-18, since both networks exceed the capacity, and capacity does not seem to have an established formal connection to rate of memorization.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_substance",
      "polarity": "pol_negative"
    },
    {
      "review_id": "r1l7E6X22m",
      "sentence_index": 29,
      "text": "This is something which would need to be explained/ substantiated separately.",
      "suffix": "",
      "review_action": "none",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "r1l7E6X22m",
      "sentence_index": 30,
      "text": "(*)",
      "suffix": "\n\n",
      "review_action": "none",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "r1l7E6X22m",
      "sentence_index": 31,
      "text": "3. Scenario discussed in Sec. 4 seems somewhat impractical.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_replicability",
      "polarity": "pol_negative"
    },
    {
      "review_id": "r1l7E6X22m",
      "sentence_index": 32,
      "text": "Given a set of m images, it is not clear that a classifier that is trained to detect between train and validation is sufficient, as one might also need to figure out if it is neither train nor val, which is a very practical scenario.",
      "suffix": "\n\n",
      "review_action": "none",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "r1l7E6X22m",
      "sentence_index": 33,
      "text": "4. Fig. 3 (right): It is not clear",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_clarity",
      "polarity": "pol_negative"
    },
    {
      "review_id": "r1l7E6X22m",
      "sentence_index": 34,
      "text": "why the fact that the classifier is able to predict which dataset the image \u2018m\u2019 corresponds to is useful or practical, as this seems to be a property of the set \u2018m\u2019 rather than the property of the trained classification model (f_\\theta)",
      "suffix": "",
      "review_action": "none",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "r1l7E6X22m",
      "sentence_index": 35,
      "text": ".",
      "suffix": "",
      "review_action": "none",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "r1l7E6X22m",
      "sentence_index": 36,
      "text": "Please clarify.",
      "suffix": "",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_clarification",
      "aspect": "asp_substance",
      "polarity": "none"
    },
    {
      "review_id": "r1l7E6X22m",
      "sentence_index": 37,
      "text": "On the other hand it is clear that using the confidence of the model to predict the dataset is a useful property, but the right side of the Fig. is very confusing.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_clarity",
      "polarity": "pol_negative"
    },
    {
      "review_id": "r1l7E6X22m",
      "sentence_index": 38,
      "text": "(*)",
      "suffix": "\n\n",
      "review_action": "none",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "r1l7E6X22m",
      "sentence_index": 39,
      "text": "6.",
      "suffix": "",
      "review_action": "none",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "r1l7E6X22m",
      "sentence_index": 40,
      "text": "It is not clear to me what the point of Sec. 5 is, given a trained model, one wants to figure out if an image was present in the training of the model.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_clarity",
      "polarity": "pol_negative"
    },
    {
      "review_id": "r1l7E6X22m",
      "sentence_index": 41,
      "text": "While the baseline approaches seem to make use of the model confidence, I cannot see how the proposed approach (which uses a classifier) makes use of the original model.",
      "suffix": "",
      "review_action": "none",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "r1l7E6X22m",
      "sentence_index": 42,
      "text": "It is also not clear why Table. 3 does not report the Bayes baseline results.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_clarity",
      "polarity": "pol_negative"
    },
    {
      "review_id": "r1l7E6X22m",
      "sentence_index": 43,
      "text": "Also, does this section use the classifier for predicting the dataset, or is the approach reported in the section, the MAT approach?",
      "suffix": "\n\n",
      "review_action": "none",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "r1l7E6X22m",
      "sentence_index": 44,
      "text": "7. ``Our experiments show that our networks can remember a large number of images and distinguish them from unseen images\u2019\u2019 -- this does not seem to be true, since the model is trained on both n as well as N -n ``unseen\u2019\u2019 images which it labels as the negative class, thus the negative class is also seen by the memorization model.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_soundness-correctness",
      "polarity": "pol_negative"
    },
    {
      "review_id": "r1l7E6X22m",
      "sentence_index": 45,
      "text": "(*)",
      "suffix": "\n\n",
      "review_action": "none",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "r1l7E6X22m",
      "sentence_index": 46,
      "text": "Minor Points",
      "suffix": "\n",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_heading",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "r1l7E6X22m",
      "sentence_index": 47,
      "text": "1. It is not clear that training a network to classify a set from another set is necessarily equivalent to ``memorization\u2019\u2019.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_clarity",
      "polarity": "pol_negative"
    },
    {
      "review_id": "r1l7E6X22m",
      "sentence_index": 48,
      "text": "In addition, the paper would also need to show that such a model does not generalize to a validation set of images.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_replicability",
      "polarity": "pol_negative"
    },
    {
      "review_id": "r1l7E6X22m",
      "sentence_index": 49,
      "text": "This is probably obvious given the results from Zhang et.al. but should be included as a sanity check.",
      "suffix": "\n",
      "review_action": "none",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "r1l7E6X22m",
      "sentence_index": 50,
      "text": "2. Figure 3: it is confusing to call the cumulative distribution of the maximum classification score as the CDF of the model (y-axis fig. 3 left) as CDF means something else generally in such contexts, as the CDF of a predictor.",
      "suffix": "\n\n\n",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_clarity",
      "polarity": "pol_negative"
    },
    {
      "review_id": "r1l7E6X22m",
      "sentence_index": 51,
      "text": "References:",
      "suffix": "\n",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_heading",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "r1l7E6X22m",
      "sentence_index": 52,
      "text": "[A]: Blier, L\u00e9onard, and Yann Ollivier. 2018. ``The Description Length of Deep Learning Models.\u2019\u2019 arXiv [cs.LG].",
      "suffix": "",
      "review_action": "arg_other",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "r1l7E6X22m",
      "sentence_index": 53,
      "text": "arXiv.",
      "suffix": "",
      "review_action": "none",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "r1l7E6X22m",
      "sentence_index": 54,
      "text": "http://arxiv.org/abs/1802.07044",
      "suffix": "",
      "review_action": "arg_other",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "r1l7E6X22m",
      "sentence_index": 55,
      "text": ".",
      "suffix": "\n\n",
      "review_action": "none",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "r1l7E6X22m",
      "sentence_index": 56,
      "text": "Preliminary Evaluation",
      "suffix": "\n",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_heading",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "r1l7E6X22m",
      "sentence_index": 57,
      "text": "There are numerous issues with the writing and clarity of the paper, while it seems like some of the observations around the confidence of classifiers are interesting, in general the connection between those set of results and the ``memorization\u2019\u2019 capabilities of the classifier trained to remember train vs val images is not clear in general.",
      "suffix": "",
      "review_action": "arg_social",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "r1l7E6X22m",
      "sentence_index": 58,
      "text": "Important points for the rebuttal are marked with (*).",
      "suffix": "",
      "review_action": "arg_social",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    }
  ],
  "rebuttal_sentences": [
    {
      "review_id": "r1l7E6X22m",
      "rebuttal_id": "Hyls7CabCQ",
      "sentence_index": 0,
      "text": "We thank the reviewer for the detailed comments.",
      "suffix": "\n\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_social",
      "alignment": [
        "context_global",
        null
      ],
      "details": {}
    },
    {
      "review_id": "r1l7E6X22m",
      "rebuttal_id": "Hyls7CabCQ",
      "sentence_index": 1,
      "text": "\u201c1.[...] without sufficient justification the assertion in the paper that the capacity of the network is well approximated by the number of parameters does not seem correct.",
      "suffix": "",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_sentences",
        [
          24,
          25,
          26
        ]
      ],
      "details": {}
    },
    {
      "review_id": "r1l7E6X22m",
      "rebuttal_id": "Hyls7CabCQ",
      "sentence_index": 2,
      "text": "Also, the claim in Fig. 1 that the transition from \u2018\u2019high capacity\u2019\u2019 to low capacity happens at the number of parameters in the network seems a bit loose and hard to substantiate from what I understand (*)\u201d",
      "suffix": "\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_sentences",
        [
          24,
          25,
          26
        ]
      ],
      "details": {}
    },
    {
      "review_id": "r1l7E6X22m",
      "rebuttal_id": "Hyls7CabCQ",
      "sentence_index": 3,
      "text": "We agree raw parameter count is not a fine estimate of the capacity of the network.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_concede-criticism",
      "alignment": [
        "context_sentences",
        [
          24,
          25,
          26
        ]
      ],
      "details": {}
    },
    {
      "review_id": "r1l7E6X22m",
      "rebuttal_id": "Hyls7CabCQ",
      "sentence_index": 4,
      "text": "However, an information-theoretic argument shows that an upper-bound of the capacity is the raw parameter count times the size of the representation (i.e. 32 bits for float32, this argument is close to that of [A]).",
      "suffix": "",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_mitigate-criticism",
      "alignment": [
        "context_sentences",
        [
          24,
          25,
          26
        ]
      ],
      "details": {}
    },
    {
      "review_id": "r1l7E6X22m",
      "rebuttal_id": "Hyls7CabCQ",
      "sentence_index": 5,
      "text": "Experimentally, we show that networks with no data-augmentation (figure 1 - purple curve) stop fitting perfectly when the parameters get within 1/10 of the quantity of information in the learning set, thus we think that raw parameter count is a good first-order approximation up to that factor.",
      "suffix": "\n\n",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_mitigate-criticism",
      "alignment": [
        "context_sentences",
        [
          24,
          25,
          26
        ]
      ],
      "details": {}
    },
    {
      "review_id": "r1l7E6X22m",
      "rebuttal_id": "Hyls7CabCQ",
      "sentence_index": 6,
      "text": "\u201c2. Sec. 3.3, [...] capacity alone cannot explain why VGG converges faster than Resnet-18 [...]\u201d",
      "suffix": "\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_sentences",
        [
          28,
          29
        ]
      ],
      "details": {}
    },
    {
      "review_id": "r1l7E6X22m",
      "rebuttal_id": "Hyls7CabCQ",
      "sentence_index": 7,
      "text": "We observe that the rate of memorization depends on the architecture and the optimization algorithm, but predicting or explaining this rate is beyond the scope of this paper.",
      "suffix": "\n\n",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_future",
      "alignment": [
        "context_sentences",
        [
          28,
          29
        ]
      ],
      "details": {}
    },
    {
      "review_id": "r1l7E6X22m",
      "rebuttal_id": "Hyls7CabCQ",
      "sentence_index": 8,
      "text": "\u201c(*) 3. Scenario discussed in Sec. 4 seems somewhat impractical. [...] one might also need to figure out if it is neither train nor val\u201d",
      "suffix": "\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_sentences",
        [
          31,
          32
        ]
      ],
      "details": {}
    },
    {
      "review_id": "r1l7E6X22m",
      "rebuttal_id": "Hyls7CabCQ",
      "sentence_index": 9,
      "text": "In section 4, we do not train a classifier to distinguish between a training and a validation set.",
      "suffix": "",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_reject-criticism",
      "alignment": [
        "context_sentences",
        [
          31,
          32
        ]
      ],
      "details": {}
    },
    {
      "review_id": "r1l7E6X22m",
      "rebuttal_id": "Hyls7CabCQ",
      "sentence_index": 10,
      "text": "Rather, we use a readily-available classifier (trained for e.g. image recognition) for a completely different purpose than what it was trained for, i.e. to distinguish datasets of images (section 4.1) or detect if a set of images comes from a given set (section 4.2).",
      "suffix": "",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_reject-criticism",
      "alignment": [
        "context_sentences",
        [
          31,
          32
        ]
      ],
      "details": {}
    },
    {
      "review_id": "r1l7E6X22m",
      "rebuttal_id": "Hyls7CabCQ",
      "sentence_index": 11,
      "text": "Section 4.2 shows how to use the K-S test to detect leakage, but the same test could tell if the m-set comes from neither the train nor the validation sets.",
      "suffix": "\n\n",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          31,
          32
        ]
      ],
      "details": {}
    },
    {
      "review_id": "r1l7E6X22m",
      "rebuttal_id": "Hyls7CabCQ",
      "sentence_index": 12,
      "text": "\u201c4. Fig. 3 (right): It is not clear why the fact that the classifier is able to predict which dataset the image \u2018m\u2019 corresponds to is useful or practical, as this seems to be a property of the set \u2018m\u2019 rather than the property of the trained classification model (f_\\theta). Please clarify. [...]\u201d",
      "suffix": "\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_sentences",
        [
          33,
          34,
          35,
          36,
          37,
          38
        ]
      ],
      "details": {}
    },
    {
      "review_id": "r1l7E6X22m",
      "rebuttal_id": "Hyls7CabCQ",
      "sentence_index": 13,
      "text": "Being able to tell from the classifier output (using e.g. the confidence) if a set of images comes from the training or the validation set is a good indicator of how much the network has memorized these images.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          33,
          34,
          35,
          36,
          37,
          38
        ]
      ],
      "details": {}
    },
    {
      "review_id": "r1l7E6X22m",
      "rebuttal_id": "Hyls7CabCQ",
      "sentence_index": 14,
      "text": "In our opinion, the most important outcome of the experiment of section 4.1 (figure 3, right) is to determine how many samples are needed to reliably discriminate the training set from the validation set (this corresponds to the solid curves), which is related to how much the model has memorized images from the training set.",
      "suffix": "\n\n",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_mitigate-criticism",
      "alignment": [
        "context_sentences",
        [
          33,
          34,
          35,
          36,
          37,
          38
        ]
      ],
      "details": {}
    },
    {
      "review_id": "r1l7E6X22m",
      "rebuttal_id": "Hyls7CabCQ",
      "sentence_index": 15,
      "text": "\u201c6. Does [section 5] use the classifier for predicting the dataset, or is the approach reported in the section, the MAT approach?\u201d",
      "suffix": "\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_sentences",
        [
          43
        ]
      ],
      "details": {}
    },
    {
      "review_id": "r1l7E6X22m",
      "rebuttal_id": "Hyls7CabCQ",
      "sentence_index": 16,
      "text": "The baseline approaches make use of the loss of the model (which is not the same as the confidence).",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          43
        ]
      ],
      "details": {}
    },
    {
      "review_id": "r1l7E6X22m",
      "rebuttal_id": "Hyls7CabCQ",
      "sentence_index": 17,
      "text": "The proposed approach uses the lower layers of the original model, and upper layers learnt on a separate, public set (this is the \u201cpartial-layers\u201d setting).",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          43
        ]
      ],
      "details": {}
    },
    {
      "review_id": "r1l7E6X22m",
      "rebuttal_id": "Hyls7CabCQ",
      "sentence_index": 18,
      "text": "Table 3 reports the results of the Bayes method on top of this network with upper layers retrained, as the MAT usually gives similar results on this task (for instance, Table 4 reports 60.8% performance with Softmax + Flip, Crop on Resnet101 for the Bayes method, and the MAT gets 61.14%).",
      "suffix": "\n\n",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          43
        ]
      ],
      "details": {}
    },
    {
      "review_id": "r1l7E6X22m",
      "rebuttal_id": "Hyls7CabCQ",
      "sentence_index": 19,
      "text": "\u201c7. ``Our experiments show that our networks can remember a large number of images and distinguish them from unseen images\u2019\u2019 -- this does not seem to be true, since the model is trained on both n as well as N -n ``unseen\u2019\u2019 images which it labels as the negative class, thus the negative class is also seen by the memorization model.",
      "suffix": "",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_sentences",
        [
          44,
          45
        ]
      ],
      "details": {}
    },
    {
      "review_id": "r1l7E6X22m",
      "rebuttal_id": "Hyls7CabCQ",
      "sentence_index": 20,
      "text": "(*)",
      "suffix": "",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_sentences",
        [
          44,
          45
        ]
      ],
      "details": {}
    },
    {
      "review_id": "r1l7E6X22m",
      "rebuttal_id": "Hyls7CabCQ",
      "sentence_index": 21,
      "text": "\u201d",
      "suffix": "\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_sentences",
        [
          44,
          45
        ]
      ],
      "details": {}
    },
    {
      "review_id": "r1l7E6X22m",
      "rebuttal_id": "Hyls7CabCQ",
      "sentence_index": 22,
      "text": "We feed our model an equal number of positives and negatives (chosen randomly) at each epoch.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          44,
          45
        ]
      ],
      "details": {}
    },
    {
      "review_id": "r1l7E6X22m",
      "rebuttal_id": "Hyls7CabCQ",
      "sentence_index": 23,
      "text": "For n < 10K, after 300 epochs the model has seen at most 3M negatives out of 15M, and yet still generalizes to the unseen negatives.",
      "suffix": "\n\n",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          44,
          45
        ]
      ],
      "details": {}
    },
    {
      "review_id": "r1l7E6X22m",
      "rebuttal_id": "Hyls7CabCQ",
      "sentence_index": 24,
      "text": "\u201c1. It is not clear that training a network to classify a set from another set is necessarily equivalent to ``memorization\u2019\u2019.",
      "suffix": "",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_sentences",
        [
          47
        ]
      ],
      "details": {}
    },
    {
      "review_id": "r1l7E6X22m",
      "rebuttal_id": "Hyls7CabCQ",
      "sentence_index": 25,
      "text": "In addition, the paper would also need to show that such a model does not generalize to a validation set of images. [...]\u201d",
      "suffix": "\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_sentences",
        [
          47,
          48
        ]
      ],
      "details": {}
    },
    {
      "review_id": "r1l7E6X22m",
      "rebuttal_id": "Hyls7CabCQ",
      "sentence_index": 26,
      "text": "With the downstream application of sections 4 and 5, we are interested in \u201cmemorization\u201d in the sense of any classifier that can tell apart images marked as \u201cpositives\u201d from images marked as \u201cnegatives\u201d.",
      "suffix": "",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_mitigate-criticism",
      "alignment": [
        "context_sentences",
        [
          47,
          48
        ]
      ],
      "details": {}
    },
    {
      "review_id": "r1l7E6X22m",
      "rebuttal_id": "Hyls7CabCQ",
      "sentence_index": 27,
      "text": "This notion is somewhat different from",
      "suffix": "\n",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_mitigate-criticism",
      "alignment": [
        "context_sentences",
        [
          47,
          48
        ]
      ],
      "details": {}
    },
    {
      "review_id": "r1l7E6X22m",
      "rebuttal_id": "Hyls7CabCQ",
      "sentence_index": 28,
      "text": "memorization as defined in other papers, where it is related to having a good training accuracy and a a validation accuracy close to random guessing.",
      "suffix": "",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_mitigate-criticism",
      "alignment": [
        "context_sentences",
        [
          47,
          48
        ]
      ],
      "details": {}
    },
    {
      "review_id": "r1l7E6X22m",
      "rebuttal_id": "Hyls7CabCQ",
      "sentence_index": 29,
      "text": "With the setup used in section 3, there is no good notion of validation: our model is expected to predict \u201c0\u201d on held-out data.",
      "suffix": "\n\n",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          47,
          48
        ]
      ],
      "details": {}
    },
    {
      "review_id": "r1l7E6X22m",
      "rebuttal_id": "Hyls7CabCQ",
      "sentence_index": 30,
      "text": "\u201c2. Figure 3: [the term CDF] is confusing\u201d",
      "suffix": "\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_sentences",
        [
          50
        ]
      ],
      "details": {}
    },
    {
      "review_id": "r1l7E6X22m",
      "rebuttal_id": "Hyls7CabCQ",
      "sentence_index": 31,
      "text": "We will update the caption to make it less ambiguous.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_by-cr",
      "alignment": [
        "context_sentences",
        [
          50
        ]
      ],
      "details": {
        "manuscript_change": true
      }
    }
  ]
}