{
  "metadata": {
    "forum_id": "rJVoEiCqKQ",
    "review_id": "rJegAZu6nQ",
    "rebuttal_id": "ryx3iYPDam",
    "title": "Deep Perm-Set Net: Learn to predict sets with unknown permutation and cardinality using deep neural networks",
    "reviewer": "AnonReviewer3",
    "rating": 7,
    "conference": "ICLR2019",
    "permalink": "https://openreview.net/forum?id=rJVoEiCqKQ&noteId=ryx3iYPDam",
    "annotator": "anno13"
  },
  "review_sentences": [
    {
      "review_id": "rJegAZu6nQ",
      "sentence_index": 0,
      "text": "The paper is really interesting.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_substance",
      "polarity": "pol_positive"
    },
    {
      "review_id": "rJegAZu6nQ",
      "sentence_index": 1,
      "text": "Set prediction problem has lots of applications in AI applications and the problem has not been conquered by deep networks.",
      "suffix": "\n\n",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_motivation-impact",
      "polarity": "pol_positive"
    },
    {
      "review_id": "rJegAZu6nQ",
      "sentence_index": 2,
      "text": "The paper proposes a formulation to learn the distribution over unobservable permutation variables based on deep networks and uses a MAP  estimator for inference.",
      "suffix": "",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "rJegAZu6nQ",
      "sentence_index": 3,
      "text": "It has object detection applications.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_motivation-impact",
      "polarity": "pol_positive"
    },
    {
      "review_id": "rJegAZu6nQ",
      "sentence_index": 4,
      "text": "The results show that it can outperform YOLOv2 and Faster R-CNN in a small pedestrian detection dataset which contains heavy occlusions.",
      "suffix": "\n\n",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "rJegAZu6nQ",
      "sentence_index": 5,
      "text": "The limitation is clearly stated in the last part of the paper that the number of possible permutations exponentially grows with the maximum set size (cardinality).",
      "suffix": "\n\n",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_clarity",
      "polarity": "pol_positive"
    },
    {
      "review_id": "rJegAZu6nQ",
      "sentence_index": 6,
      "text": "In the author response period, I would like the author give more details about the pedestrian detection experiments, such as how many dense layers are used after ResNet-101, what are the training and inference time, is it possible to report results on PASCAL VOC (only the person class).",
      "suffix": "\n\n",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_clarification",
      "aspect": "asp_substance",
      "polarity": "none"
    },
    {
      "review_id": "rJegAZu6nQ",
      "sentence_index": 7,
      "text": "The method is exciting for object detection funs.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_originality",
      "polarity": "pol_positive"
    },
    {
      "review_id": "rJegAZu6nQ",
      "sentence_index": 8,
      "text": "I would like to encourage the authors to release the code and let the whole object detection community overcome the limitation in the paper.",
      "suffix": "",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_experiment",
      "aspect": "asp_replicability",
      "polarity": "none"
    }
  ],
  "rebuttal_sentences": [
    {
      "review_id": "rJegAZu6nQ",
      "rebuttal_id": "ryx3iYPDam",
      "sentence_index": 0,
      "text": "We appreciate AnonReviewer3\u2019s recognition of our work.",
      "suffix": "\n\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_social",
      "alignment": [
        "context_global",
        null
      ],
      "details": {}
    },
    {
      "review_id": "rJegAZu6nQ",
      "rebuttal_id": "ryx3iYPDam",
      "sentence_index": 1,
      "text": "- Network details",
      "suffix": "\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_sentences",
        [
          6
        ]
      ],
      "details": {}
    },
    {
      "review_id": "rJegAZu6nQ",
      "rebuttal_id": "ryx3iYPDam",
      "sentence_index": 2,
      "text": "We only replace the last fc layer of ResNet-101 with a new fc layer mapping to 49 (5+20+24 = 49) outputs for calculating cardinality, states and permutation (the choice of these numbers explained in our response to R2).",
      "suffix": "\n\n",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          6
        ]
      ],
      "details": {}
    },
    {
      "review_id": "rJegAZu6nQ",
      "rebuttal_id": "ryx3iYPDam",
      "sentence_index": 3,
      "text": "- inference time",
      "suffix": "\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_sentences",
        [
          6
        ]
      ],
      "details": {}
    },
    {
      "review_id": "rJegAZu6nQ",
      "rebuttal_id": "ryx3iYPDam",
      "sentence_index": 4,
      "text": "We also performed extra experiment on accuracy and inference time between different detectors (on the same machine and GPU) reported here:",
      "suffix": "\n\n",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          6
        ]
      ],
      "details": {}
    },
    {
      "review_id": "rJegAZu6nQ",
      "rebuttal_id": "ryx3iYPDam",
      "sentence_index": 5,
      "text": "Faster R-CNN: AP=0.68, Inference time=101 ms",
      "suffix": "\n",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          6
        ]
      ],
      "details": {}
    },
    {
      "review_id": "rJegAZu6nQ",
      "rebuttal_id": "ryx3iYPDam",
      "sentence_index": 6,
      "text": "YOLO\u00a0v2: AP=0.68, Inference time=12.3 ms",
      "suffix": "\n",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          6
        ]
      ],
      "details": {}
    },
    {
      "review_id": "rJegAZu6nQ",
      "rebuttal_id": "ryx3iYPDam",
      "sentence_index": 7,
      "text": "YOLO\u00a0v3: AP=0.70, Inference time=18.2 ms",
      "suffix": "\n",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          6
        ]
      ],
      "details": {}
    },
    {
      "review_id": "rJegAZu6nQ",
      "rebuttal_id": "ryx3iYPDam",
      "sentence_index": 8,
      "text": "Our network: AP=0.75, Inference time=15.1 ms",
      "suffix": "\n\n",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          6
        ]
      ],
      "details": {}
    },
    {
      "review_id": "rJegAZu6nQ",
      "rebuttal_id": "ryx3iYPDam",
      "sentence_index": 9,
      "text": "- test on PASCAL VOC",
      "suffix": "\n",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          6
        ]
      ],
      "details": {}
    },
    {
      "review_id": "rJegAZu6nQ",
      "rebuttal_id": "ryx3iYPDam",
      "sentence_index": 10,
      "text": "We observed PASCAL VOC dataset include many images with more than 4 persons.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          6
        ]
      ],
      "details": {}
    },
    {
      "review_id": "rJegAZu6nQ",
      "rebuttal_id": "ryx3iYPDam",
      "sentence_index": 11,
      "text": "Considering the images include up to 4 persons only, we might not have enough training data to train ResNet-101 network.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          6
        ]
      ],
      "details": {}
    }
  ]
}