{
  "metadata": {
    "forum_id": "rJl2E3AcF7",
    "review_id": "SklHkeMohX",
    "rebuttal_id": "HJgvmMlBTQ",
    "title": "Doubly Sparse: Sparse Mixture of Sparse Experts for Efficient Softmax Inference",
    "reviewer": "AnonReviewer3",
    "rating": 4,
    "conference": "ICLR2019",
    "permalink": "https://openreview.net/forum?id=rJl2E3AcF7&noteId=HJgvmMlBTQ",
    "annotator": "anno10"
  },
  "review_sentences": [
    {
      "review_id": "SklHkeMohX",
      "sentence_index": 0,
      "text": "The paper proposes doubly sparse, which is a sparse mixture of sparse experts and learns a two-level class hierarchy, for efficient softmax inference.",
      "suffix": "\n\n",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "SklHkeMohX",
      "sentence_index": 1,
      "text": "[+] It reduces computational cost compared to full softmax.",
      "suffix": "\n",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_substance",
      "polarity": "pol_positive"
    },
    {
      "review_id": "SklHkeMohX",
      "sentence_index": 2,
      "text": "[+] Ablation study is done for group lasso, expert lasso and load balancing, which help understand the effect of different components of the proposed",
      "suffix": "\n",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_substance",
      "polarity": "pol_positive"
    },
    {
      "review_id": "SklHkeMohX",
      "sentence_index": 3,
      "text": "[-] It seems to me the motivation is similar to that of Sparsely-Gated MoE (Shazeer et al. 2017), but it is not clear how the proposed two-hierarchy method is superior to the Sparsely-Gated MoE. It would be helpful the paper discuss more about this.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_originality",
      "polarity": "pol_negative"
    },
    {
      "review_id": "SklHkeMohX",
      "sentence_index": 4,
      "text": "Besides, in evaluation, the paper only compares Doubly Sparse with full softmax.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_substance",
      "polarity": "pol_negative"
    },
    {
      "review_id": "SklHkeMohX",
      "sentence_index": 5,
      "text": "Why not compare with Sparsely-Gated MoE?",
      "suffix": "\n\n",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_experiment",
      "aspect": "asp_substance",
      "polarity": "pol_negative"
    },
    {
      "review_id": "SklHkeMohX",
      "sentence_index": 6,
      "text": "Overall, I think this paper is below the borderline of acceptance due to insufficient comparison with Sparsely-Gated MoE.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_substance",
      "polarity": "pol_negative"
    }
  ],
  "rebuttal_sentences": [
    {
      "review_id": "SklHkeMohX",
      "rebuttal_id": "HJgvmMlBTQ",
      "sentence_index": 0,
      "text": "Dear reviewer:",
      "suffix": "\n\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_social",
      "alignment": [
        "context_global",
        null
      ],
      "details": {}
    },
    {
      "review_id": "SklHkeMohX",
      "rebuttal_id": "HJgvmMlBTQ",
      "sentence_index": 1,
      "text": "We appreciate your comments but it appears that there is some misunderstanding regarding our contribution in this work.",
      "suffix": "\n\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_social",
      "alignment": [
        "context_global",
        null
      ],
      "details": {}
    },
    {
      "review_id": "SklHkeMohX",
      "rebuttal_id": "HJgvmMlBTQ",
      "sentence_index": 2,
      "text": "Our work is for softmax inference speedup",
      "suffix": "",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_reject-criticism",
      "alignment": [
        "context_sentences",
        [
          3,
          4,
          5
        ]
      ],
      "details": {}
    },
    {
      "review_id": "SklHkeMohX",
      "rebuttal_id": "HJgvmMlBTQ",
      "sentence_index": 3,
      "text": "while Sparse-Gated MoE (MoE) was not designed to do so.",
      "suffix": "",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_reject-criticism",
      "alignment": [
        "context_sentences",
        [
          3,
          4,
          5
        ]
      ],
      "details": {}
    },
    {
      "review_id": "SklHkeMohX",
      "rebuttal_id": "HJgvmMlBTQ",
      "sentence_index": 4,
      "text": "It was designed to increase the model expressiveness.",
      "suffix": "",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_reject-criticism",
      "alignment": [
        "context_sentences",
        [
          3,
          4,
          5
        ]
      ],
      "details": {}
    },
    {
      "review_id": "SklHkeMohX",
      "rebuttal_id": "HJgvmMlBTQ",
      "sentence_index": 5,
      "text": "It cannot achieve speedup because each expert still contains full softmax space as we mentioned in the background section (page 2 line 21st) and method section (page 2 last 4th line).",
      "suffix": "",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_reject-criticism",
      "alignment": [
        "context_sentences",
        [
          3,
          4,
          5
        ]
      ],
      "details": {}
    },
    {
      "review_id": "SklHkeMohX",
      "rebuttal_id": "HJgvmMlBTQ",
      "sentence_index": 6,
      "text": "And since it is slower than the standard softmax by definition, we chose not to compare with it in the paper.",
      "suffix": "\n\n",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_reject-criticism",
      "alignment": [
        "context_sentences",
        [
          3,
          4,
          5
        ]
      ],
      "details": {}
    },
    {
      "review_id": "SklHkeMohX",
      "rebuttal_id": "HJgvmMlBTQ",
      "sentence_index": 7,
      "text": "Our algorithm addresses speed up in softmax inference.",
      "suffix": "",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_reject-criticism",
      "alignment": [
        "context_sentences",
        [
          3,
          4,
          5
        ]
      ],
      "details": {}
    },
    {
      "review_id": "SklHkeMohX",
      "rebuttal_id": "HJgvmMlBTQ",
      "sentence_index": 8,
      "text": "This is fundamentally different from Sparse-gated MoE. We divide the output space into multiple overlapped subsets.",
      "suffix": "",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_reject-criticism",
      "alignment": [
        "context_sentences",
        [
          3,
          4,
          5
        ]
      ],
      "details": {}
    },
    {
      "review_id": "SklHkeMohX",
      "rebuttal_id": "HJgvmMlBTQ",
      "sentence_index": 9,
      "text": "To find top-k predictions, we only search a few subsets.",
      "suffix": "",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_reject-criticism",
      "alignment": [
        "context_sentences",
        [
          3,
          4,
          5
        ]
      ],
      "details": {}
    },
    {
      "review_id": "SklHkeMohX",
      "rebuttal_id": "HJgvmMlBTQ",
      "sentence_index": 10,
      "text": "While in full softmax or MoE, the complexity is linear with output dimension.",
      "suffix": "",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_reject-criticism",
      "alignment": [
        "context_sentences",
        [
          3,
          4,
          5
        ]
      ],
      "details": {}
    },
    {
      "review_id": "SklHkeMohX",
      "rebuttal_id": "HJgvmMlBTQ",
      "sentence_index": 11,
      "text": "Therefore, we did not include a comparison with Sparsely-Gated MoE in our article and only compare with full softmax.",
      "suffix": "\n\n",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_reject-criticism",
      "alignment": [
        "context_sentences",
        [
          3,
          4,
          5
        ]
      ],
      "details": {}
    },
    {
      "review_id": "SklHkeMohX",
      "rebuttal_id": "HJgvmMlBTQ",
      "sentence_index": 12,
      "text": "Just for additional reference, we tested Sparsely-Gated MoE with different experts in PTB dataset; we compared the results to DS-Softmax.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_done",
      "alignment": [
        "context_sentences",
        [
          3,
          4,
          5
        ]
      ],
      "details": {
        "request_out_of_scope": false
      }
    },
    {
      "review_id": "SklHkeMohX",
      "rebuttal_id": "HJgvmMlBTQ",
      "sentence_index": 13,
      "text": "As expected, the Sparsely-Gated MoE does not achieve speedup in terms of softmax inference.",
      "suffix": "\n\n",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_reject-criticism",
      "alignment": [
        "context_sentences",
        [
          3,
          4,
          5
        ]
      ],
      "details": {}
    },
    {
      "review_id": "SklHkeMohX",
      "rebuttal_id": "HJgvmMlBTQ",
      "sentence_index": 14,
      "text": "_____________________________________________",
      "suffix": "",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_in-rebuttal",
        null
      ],
      "details": {}
    },
    {
      "review_id": "SklHkeMohX",
      "rebuttal_id": "HJgvmMlBTQ",
      "sentence_index": 15,
      "text": "_",
      "suffix": "\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_in-rebuttal",
        null
      ],
      "details": {}
    },
    {
      "review_id": "SklHkeMohX",
      "rebuttal_id": "HJgvmMlBTQ",
      "sentence_index": 16,
      "text": "Method | Top 1 | Top 5 |Top 10| FLOPs|",
      "suffix": "\n",
      "rebuttal_stance": "other",
      "rebuttal_action": "rebuttal_none",
      "alignment": [
        "context_error",
        null
      ],
      "details": {}
    },
    {
      "review_id": "SklHkeMohX",
      "rebuttal_id": "HJgvmMlBTQ",
      "sentence_index": 17,
      "text": "DS-8       | 0.257 | 0.448 | 0.530 | 2.84x |",
      "suffix": "\n",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_reject-criticism",
      "alignment": [
        "context_sentences",
        [
          3,
          4,
          5
        ]
      ],
      "details": {}
    },
    {
      "review_id": "SklHkeMohX",
      "rebuttal_id": "HJgvmMlBTQ",
      "sentence_index": 18,
      "text": "MoE-8    | 0.258 | 0.448 | 0.530 |  1x",
      "suffix": "",
      "rebuttal_stance": "other",
      "rebuttal_action": "rebuttal_none",
      "alignment": [
        "context_error",
        null
      ],
      "details": {}
    },
    {
      "review_id": "SklHkeMohX",
      "rebuttal_id": "HJgvmMlBTQ",
      "sentence_index": 19,
      "text": "|",
      "suffix": "\n",
      "rebuttal_stance": "other",
      "rebuttal_action": "rebuttal_none",
      "alignment": [
        "context_error",
        null
      ],
      "details": {}
    },
    {
      "review_id": "SklHkeMohX",
      "rebuttal_id": "HJgvmMlBTQ",
      "sentence_index": 20,
      "text": "DS-16     | 0.258 | 0.450 | 0.529 | 5.13x |",
      "suffix": "\n",
      "rebuttal_stance": "other",
      "rebuttal_action": "rebuttal_none",
      "alignment": [
        "context_error",
        null
      ],
      "details": {}
    },
    {
      "review_id": "SklHkeMohX",
      "rebuttal_id": "HJgvmMlBTQ",
      "sentence_index": 21,
      "text": "MoE-16  | 0.258 | 0.449 | 0.530 | 1x",
      "suffix": "",
      "rebuttal_stance": "other",
      "rebuttal_action": "rebuttal_none",
      "alignment": [
        "context_error",
        null
      ],
      "details": {}
    },
    {
      "review_id": "SklHkeMohX",
      "rebuttal_id": "HJgvmMlBTQ",
      "sentence_index": 22,
      "text": "|",
      "suffix": "\n",
      "rebuttal_stance": "other",
      "rebuttal_action": "rebuttal_none",
      "alignment": [
        "context_error",
        null
      ],
      "details": {}
    },
    {
      "review_id": "SklHkeMohX",
      "rebuttal_id": "HJgvmMlBTQ",
      "sentence_index": 23,
      "text": "DS-32     | 0.259 | 0.449 | 0.529 | 9.43x |",
      "suffix": "\n",
      "rebuttal_stance": "other",
      "rebuttal_action": "rebuttal_none",
      "alignment": [
        "context_error",
        null
      ],
      "details": {}
    },
    {
      "review_id": "SklHkeMohX",
      "rebuttal_id": "HJgvmMlBTQ",
      "sentence_index": 24,
      "text": "MoE-32  | 0.259 | 0.450 | 0.531 | 1x",
      "suffix": "",
      "rebuttal_stance": "other",
      "rebuttal_action": "rebuttal_none",
      "alignment": [
        "context_error",
        null
      ],
      "details": {}
    },
    {
      "review_id": "SklHkeMohX",
      "rebuttal_id": "HJgvmMlBTQ",
      "sentence_index": 25,
      "text": "|",
      "suffix": "\n",
      "rebuttal_stance": "other",
      "rebuttal_action": "rebuttal_none",
      "alignment": [
        "context_error",
        null
      ],
      "details": {}
    },
    {
      "review_id": "SklHkeMohX",
      "rebuttal_id": "HJgvmMlBTQ",
      "sentence_index": 26,
      "text": "DS-64     | 0.258 | 0.450 | 0.529 |15.99x|",
      "suffix": "\n",
      "rebuttal_stance": "other",
      "rebuttal_action": "rebuttal_none",
      "alignment": [
        "context_error",
        null
      ],
      "details": {}
    },
    {
      "review_id": "SklHkeMohX",
      "rebuttal_id": "HJgvmMlBTQ",
      "sentence_index": 27,
      "text": "MoE-64  | 0.260 | 0.451 | 0.531 | 1x",
      "suffix": "",
      "rebuttal_stance": "other",
      "rebuttal_action": "rebuttal_none",
      "alignment": [
        "context_error",
        null
      ],
      "details": {}
    },
    {
      "review_id": "SklHkeMohX",
      "rebuttal_id": "HJgvmMlBTQ",
      "sentence_index": 28,
      "text": "|",
      "suffix": "\n",
      "rebuttal_stance": "other",
      "rebuttal_action": "rebuttal_none",
      "alignment": [
        "context_error",
        null
      ],
      "details": {}
    },
    {
      "review_id": "SklHkeMohX",
      "rebuttal_id": "HJgvmMlBTQ",
      "sentence_index": 29,
      "text": "_____________________________________________",
      "suffix": "",
      "rebuttal_stance": "other",
      "rebuttal_action": "rebuttal_none",
      "alignment": [
        "context_error",
        null
      ],
      "details": {}
    },
    {
      "review_id": "SklHkeMohX",
      "rebuttal_id": "HJgvmMlBTQ",
      "sentence_index": 30,
      "text": "_",
      "suffix": "\n\n",
      "rebuttal_stance": "other",
      "rebuttal_action": "rebuttal_none",
      "alignment": [
        "context_error",
        null
      ],
      "details": {}
    },
    {
      "review_id": "SklHkeMohX",
      "rebuttal_id": "HJgvmMlBTQ",
      "sentence_index": 31,
      "text": "* FLOPs means FLOPs reduction (i.e. baseline's FLOPs / target method's FLOPs).",
      "suffix": "",
      "rebuttal_stance": "other",
      "rebuttal_action": "rebuttal_none",
      "alignment": [
        "context_error",
        null
      ],
      "details": {}
    }
  ]
}