{
  "metadata": {
    "forum_id": "HJgfDREKDB",
    "review_id": "SklBkB-f9r",
    "rebuttal_id": "rkl6-fY9jr",
    "title": "Higher-Order Function Networks for Learning Composable 3D Object Representations",
    "reviewer": "AnonReviewer1",
    "rating": 6,
    "conference": "ICLR2020",
    "permalink": "https://openreview.net/forum?id=HJgfDREKDB&noteId=rkl6-fY9jr",
    "annotator": "anno2"
  },
  "review_sentences": [
    {
      "review_id": "SklBkB-f9r",
      "sentence_index": 0,
      "text": "This work is focused on learning 3D object representations (decoders) that can be computed more efficiently than existing methods.",
      "suffix": "",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "SklBkB-f9r",
      "sentence_index": 1,
      "text": "The computational inefficiency of these methods is that you learn a (big) fixed decoder for all objects (all z latents), and then need to apply it individually on either each point cloud point you want to produce, or each voxel in the output (this problem exists for both the class of methods that deform a uniform distribution R^3 -> R^3 a la FoldingNet, or directly predict the 3D function R^3 -> R e.g. DeepSDF).",
      "suffix": "",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "SklBkB-f9r",
      "sentence_index": 2,
      "text": "The authors propose that the encoder directly predict the weights and biases of a decoder network that, since it is specific to the particular object being reconstructed, can be much smaller and thus much cheaper to compute.",
      "suffix": "\n\n",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "SklBkB-f9r",
      "sentence_index": 3,
      "text": "The authors then note the fact that their method lacks a continuous latent space that allows for interpolation, as provided by existing (VAE-like) methods.",
      "suffix": "",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "SklBkB-f9r",
      "sentence_index": 4,
      "text": "They propose to solve this by learning an MLP that produces the output by recurrent application, and then composing subapplications of different networks as a type of interpolation.",
      "suffix": "\n\n",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "SklBkB-f9r",
      "sentence_index": 5,
      "text": "-------------------",
      "suffix": "\n\n",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_heading",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "SklBkB-f9r",
      "sentence_index": 6,
      "text": "I like this work, it addresses a real problem in a number of models for 3D representation learning (similar models are also used for e.g. cryo-EM reconstruction).",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_motivation-impact",
      "polarity": "pol_positive"
    },
    {
      "review_id": "SklBkB-f9r",
      "sentence_index": 7,
      "text": "While the fast weights approach is not totally original, its application to this problem is novel and very well-suited to it.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_originality",
      "polarity": "pol_positive"
    },
    {
      "review_id": "SklBkB-f9r",
      "sentence_index": 8,
      "text": "I was a bit surprised by just how much the decoder network could be shrunk by using fast weights.",
      "suffix": "\n\n",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_motivation-impact",
      "polarity": "pol_positive"
    },
    {
      "review_id": "SklBkB-f9r",
      "sentence_index": 9,
      "text": "The paper is also quite well written.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_clarity",
      "polarity": "pol_positive"
    },
    {
      "review_id": "SklBkB-f9r",
      "sentence_index": 10,
      "text": "I especially like how Section 2 synthesizes existing work into model categories which make it easier to think about their relationships.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_clarity",
      "polarity": "pol_positive"
    },
    {
      "review_id": "SklBkB-f9r",
      "sentence_index": 11,
      "text": "I also think the explanation in Sec. 3.2, while kind of obvious, is a nice way think about decoder vs. fast weights.",
      "suffix": "\n\n",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_soundness-correctness",
      "polarity": "pol_positive"
    },
    {
      "review_id": "SklBkB-f9r",
      "sentence_index": 12,
      "text": "I like that the authors are straightforward about the deficiency of the method (i.e. that you can't interpolate in latent space).",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_clarity",
      "polarity": "pol_positive"
    },
    {
      "review_id": "SklBkB-f9r",
      "sentence_index": 13,
      "text": "Their proposed solution of functional composition is exceedingly clever but in my opinion too impractical to really be useful.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_soundness-correctness",
      "polarity": "pol_negative"
    },
    {
      "review_id": "SklBkB-f9r",
      "sentence_index": 14,
      "text": "It adds extra complexity, requires you to do function composition which may be less expressive and takes more coomputation, etc. And to what end?",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_soundness-correctness",
      "polarity": "pol_negative"
    },
    {
      "review_id": "SklBkB-f9r",
      "sentence_index": 15,
      "text": "The purpose of generative models is not to interpolate per se; the interpolation is really a sanity check that the model is capturing the underlying distribution rather than just memorizing training examples.",
      "suffix": "",
      "review_action": "arg_fact",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "SklBkB-f9r",
      "sentence_index": 16,
      "text": "The function composition doesn't capture that.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_soundness-correctness",
      "polarity": "pol_negative"
    },
    {
      "review_id": "SklBkB-f9r",
      "sentence_index": 17,
      "text": "I think the authors should just acknowledge that you can't soundly *sample* from their generative model the way e.g. VAE or GAN allows (their function composition is not a sampling method).",
      "suffix": "",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_edit",
      "aspect": "asp_soundness-correctness",
      "polarity": "none"
    },
    {
      "review_id": "SklBkB-f9r",
      "sentence_index": 18,
      "text": "But I think there are lots of useful things you can do without that capability, e.g. do 3D point cloud completion, go image -> structure, etc.",
      "suffix": "",
      "review_action": "arg_fact",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "SklBkB-f9r",
      "sentence_index": 19,
      "text": "I think this function composition angle should be deemphasized in the title/abstract, but I think the paper stands",
      "suffix": "",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_edit",
      "aspect": "asp_soundness-correctness",
      "polarity": "none"
    },
    {
      "review_id": "SklBkB-f9r",
      "sentence_index": 20,
      "text": "reasonably on its own",
      "suffix": "",
      "review_action": "none",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "SklBkB-f9r",
      "sentence_index": 21,
      "text": "without that",
      "suffix": "",
      "review_action": "none",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "SklBkB-f9r",
      "sentence_index": 22,
      "text": ".",
      "suffix": "\n\n",
      "review_action": "none",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "SklBkB-f9r",
      "sentence_index": 23,
      "text": "Nits:",
      "suffix": "\n",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_heading",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "SklBkB-f9r",
      "sentence_index": 24,
      "text": "- In Figure 2 it's pretty hard to see the differences between the methods. What exactly is being visualized here? DeepSDF shold be visualizing surface normals vs. HOF which is point clouds, right?",
      "suffix": "\n",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_clarification",
      "aspect": "asp_clarity",
      "polarity": "none"
    },
    {
      "review_id": "SklBkB-f9r",
      "sentence_index": 25,
      "text": "- For predicting a deformation R^3 -> R^3 function composition sort of makes sense, but how generalizable",
      "suffix": "",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_explanation",
      "aspect": "asp_clarity",
      "polarity": "none"
    },
    {
      "review_id": "SklBkB-f9r",
      "sentence_index": 26,
      "text": "is this approach e.g. to directly predicting a function R^3 -> R (a la DeepSDF)?",
      "suffix": "",
      "review_action": "none",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "SklBkB-f9r",
      "sentence_index": 27,
      "text": "I think there are ways this function composition approach could generalize, e.g. using skip connections and layer dropout (which encourages layers to be composable).",
      "suffix": "",
      "review_action": "arg_fact",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    }
  ],
  "rebuttal_sentences": [
    {
      "review_id": "SklBkB-f9r",
      "rebuttal_id": "rkl6-fY9jr",
      "sentence_index": 0,
      "text": "Thank you for your review and comments.",
      "suffix": "",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_social",
      "alignment": [
        "context_global",
        null
      ],
      "details": {}
    },
    {
      "review_id": "SklBkB-f9r",
      "rebuttal_id": "rkl6-fY9jr",
      "sentence_index": 1,
      "text": "We\u2019ve made a number of additions and improvements to address them in the updated version of the paper, which we will submit before the end of the discussion period.",
      "suffix": "\n\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_summary",
      "alignment": [
        "context_in-rebuttal",
        null
      ],
      "details": {}
    },
    {
      "review_id": "SklBkB-f9r",
      "rebuttal_id": "rkl6-fY9jr",
      "sentence_index": 2,
      "text": "First, we have performed a new set of experiments on the larger dataset in [1].",
      "suffix": "",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_summary",
      "alignment": [
        "context_in-rebuttal",
        null
      ],
      "details": {}
    },
    {
      "review_id": "SklBkB-f9r",
      "rebuttal_id": "rkl6-fY9jr",
      "sentence_index": 3,
      "text": "HOF shows greater average reconstruction accuracy than the methods compared in [1].",
      "suffix": "",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_summary",
      "alignment": [
        "context_in-rebuttal",
        null
      ],
      "details": {}
    },
    {
      "review_id": "SklBkB-f9r",
      "rebuttal_id": "rkl6-fY9jr",
      "sentence_index": 4,
      "text": "Second, we also perform ablation experiments to demonstrate that HOF performs competitively even when we vary the encoder architecture, decoder depth, decoder activation function, or input sampling for the decoder network.",
      "suffix": "",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_summary",
      "alignment": [
        "context_in-rebuttal",
        null
      ],
      "details": {}
    },
    {
      "review_id": "SklBkB-f9r",
      "rebuttal_id": "rkl6-fY9jr",
      "sentence_index": 5,
      "text": "For example, using Resnet18 as the encoder architecture or using a decoder network with twice as many hidden layers showed nearly identical performance in terms of average Chamfer distance on the test set.",
      "suffix": "",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_summary",
      "alignment": [
        "context_in-rebuttal",
        null
      ],
      "details": {}
    },
    {
      "review_id": "SklBkB-f9r",
      "rebuttal_id": "rkl6-fY9jr",
      "sentence_index": 6,
      "text": "The complete quantitative results will be included in an updated PDF before the end of the discussion period.",
      "suffix": "\n\n",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_by-cr",
      "alignment": [
        "context_in-rebuttal",
        null
      ],
      "details": {
        "manuscript_change": true
      }
    },
    {
      "review_id": "SklBkB-f9r",
      "rebuttal_id": "rkl6-fY9jr",
      "sentence_index": 7,
      "text": "\"The purpose of generative models is not to interpolate per se; the interpolation is really a sanity check that the model is capturing the underlying distribution rather than just memorizing training examples.",
      "suffix": "",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_sentences",
        [
          15,
          16,
          17,
          18,
          19,
          20,
          21,
          22
        ]
      ],
      "details": {}
    },
    {
      "review_id": "SklBkB-f9r",
      "rebuttal_id": "rkl6-fY9jr",
      "sentence_index": 8,
      "text": "The function composition doesn't capture that.",
      "suffix": "",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_sentences",
        [
          15,
          16,
          17,
          18,
          19,
          20,
          21,
          22
        ]
      ],
      "details": {}
    },
    {
      "review_id": "SklBkB-f9r",
      "rebuttal_id": "rkl6-fY9jr",
      "sentence_index": 9,
      "text": "I think the authors should just acknowledge that you can't soundly *sample* from their generative model the way e.g. VAE or GAN allows (their function composition is not a sampling method).",
      "suffix": "",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_sentences",
        [
          15,
          16,
          17,
          18,
          19,
          20,
          21,
          22
        ]
      ],
      "details": {}
    },
    {
      "review_id": "SklBkB-f9r",
      "rebuttal_id": "rkl6-fY9jr",
      "sentence_index": 10,
      "text": "But I think there are lots of useful things you can do without that capability, e.g. do 3D point cloud completion, go image -> structure, etc. I think this function composition angle should be deemphasized in the title/abstract, but I think the paper stands  reasonably on its own without that.\"",
      "suffix": "\n\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_sentences",
        [
          15,
          16,
          17,
          18,
          19,
          20,
          21,
          22
        ]
      ],
      "details": {}
    },
    {
      "review_id": "SklBkB-f9r",
      "rebuttal_id": "rkl6-fY9jr",
      "sentence_index": 11,
      "text": "We agree that the current formulation of composition is not equivalent to a generative model.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_concede-criticism",
      "alignment": [
        "context_sentences",
        [
          15,
          16,
          17,
          18,
          19,
          20,
          21,
          22
        ]
      ],
      "details": {}
    },
    {
      "review_id": "SklBkB-f9r",
      "rebuttal_id": "rkl6-fY9jr",
      "sentence_index": 12,
      "text": "In our work, function composition primarily serves the purpose of demonstrating that the model learns a meaningful subspace of objects (rather than memorizing the training set, as you mentioned).",
      "suffix": "",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_reject-criticism",
      "alignment": [
        "context_sentences",
        [
          15,
          16,
          17,
          18,
          19,
          20,
          21,
          22
        ]
      ],
      "details": {}
    },
    {
      "review_id": "SklBkB-f9r",
      "rebuttal_id": "rkl6-fY9jr",
      "sentence_index": 13,
      "text": "We have revised the abstract to clarify this point.",
      "suffix": "\n\n",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_done",
      "alignment": [
        "context_sentences",
        [
          15,
          16,
          17,
          18,
          19,
          20,
          21,
          22
        ]
      ],
      "details": {
        "request_out_of_scope": true
      }
    },
    {
      "review_id": "SklBkB-f9r",
      "rebuttal_id": "rkl6-fY9jr",
      "sentence_index": 14,
      "text": "\"In Figure 2 it's pretty hard to see the differences between the methods. What exactly is being visualized here? DeepSDF shold be visualizing surface normals vs. HOF which is point clouds, right?\"",
      "suffix": "\n\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_sentences",
        [
          24
        ]
      ],
      "details": {}
    },
    {
      "review_id": "SklBkB-f9r",
      "rebuttal_id": "rkl6-fY9jr",
      "sentence_index": 15,
      "text": "We have clarified in the manuscript that our comparisons are between architectures, rather than training objectives/output representations.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_done",
      "alignment": [
        "context_sentences",
        [
          24
        ]
      ],
      "details": {
        "request_out_of_scope": true
      }
    },
    {
      "review_id": "SklBkB-f9r",
      "rebuttal_id": "rkl6-fY9jr",
      "sentence_index": 16,
      "text": "Thus our DeepSDF, FoldingNet, and HOF architectures all output point clouds, which can be compared directly.",
      "suffix": "\n\n",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          24
        ]
      ],
      "details": {}
    },
    {
      "review_id": "SklBkB-f9r",
      "rebuttal_id": "rkl6-fY9jr",
      "sentence_index": 17,
      "text": "\"For predicting a deformation R^3 -> R^3 function composition sort of makes sense, but how generalizable is this approach e.g. to directly predicting a function R^3 -> R (a la DeepSDF)? I think there are ways this function composition approach could generalize, e.g. using skip connections and layer dropout (which encourages layers to be composable).\"",
      "suffix": "\n\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_sentences",
        [
          25,
          26,
          27
        ]
      ],
      "details": {}
    },
    {
      "review_id": "SklBkB-f9r",
      "rebuttal_id": "rkl6-fY9jr",
      "sentence_index": 18,
      "text": "Additional techniques for promoting learning of composable representations such as skip connections and layer dropout are an exciting direction for future research.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_future",
      "alignment": [
        "context_sentences",
        [
          25,
          26,
          27
        ]
      ],
      "details": {}
    },
    {
      "review_id": "SklBkB-f9r",
      "rebuttal_id": "rkl6-fY9jr",
      "sentence_index": 19,
      "text": "One way function composition might allow for R^3 -> R mappings by composing a mapping from R^3 -> R^3 and taking the only first dimension of each element in the final output.",
      "suffix": "\n\n",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          25,
          26,
          27
        ]
      ],
      "details": {}
    },
    {
      "review_id": "SklBkB-f9r",
      "rebuttal_id": "rkl6-fY9jr",
      "sentence_index": 20,
      "text": "Thank you again for your feedback.",
      "suffix": "\n\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_social",
      "alignment": [
        "context_global",
        null
      ],
      "details": {}
    },
    {
      "review_id": "SklBkB-f9r",
      "rebuttal_id": "rkl6-fY9jr",
      "sentence_index": 21,
      "text": "[1] M. Tatarchenko, S. R. Richter, R. Ranftl, Z. Li, V. Koltun, and T. Brox, \u201cWhat do single-view 3d reconstruction networks learn?,\u201d in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3405\u2013 3414, 2019.",
      "suffix": "",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_other",
      "alignment": [
        "context_in-rebuttal",
        null
      ],
      "details": {}
    }
  ]
}