{
  "metadata": {
    "forum_id": "rylwJxrYDS",
    "review_id": "B1xXOITI9H",
    "rebuttal_id": "BJlsGxtnsS",
    "title": "vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations",
    "reviewer": "AnonReviewer3",
    "rating": 6,
    "conference": "ICLR2020",
    "permalink": "https://openreview.net/forum?id=rylwJxrYDS&noteId=BJlsGxtnsS",
    "annotator": "anno2"
  },
  "review_sentences": [
    {
      "review_id": "B1xXOITI9H",
      "sentence_index": 0,
      "text": "Though rather dense in its exposition, this paper is an interesting contribution to the area of self-supervised learning  based on discrete representations.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_motivation-impact",
      "polarity": "pol_positive"
    },
    {
      "review_id": "B1xXOITI9H",
      "sentence_index": 1,
      "text": "What would make it stronger imo is to address the issue of how much is gained from a discrete vs. continuous representation.",
      "suffix": "",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_edit",
      "aspect": "asp_substance",
      "polarity": "none"
    },
    {
      "review_id": "B1xXOITI9H",
      "sentence_index": 2,
      "text": "The authors take it as a given that discrete is good because it allows us to leverage work in NLP.",
      "suffix": "",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_quote",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "B1xXOITI9H",
      "sentence_index": 3,
      "text": "That makes sense -- but at what cost?",
      "suffix": "\n\n",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_clarification",
      "aspect": "asp_soundness-correctness",
      "polarity": "none"
    },
    {
      "review_id": "B1xXOITI9H",
      "sentence_index": 4,
      "text": "\"Table 4 shows that our first results are promising, even though they are not as good as the state of the art.\" The state of the art on LibriSpeech is not Mohamed at al. 2019. See e.g. Irie et al. Interspeech 2019 for better result",
      "suffix": "\n\n",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_soundness-correctness",
      "polarity": "pol_negative"
    },
    {
      "review_id": "B1xXOITI9H",
      "sentence_index": 5,
      "text": "The Conclusion is very sparse. \"In future work, we are planning to apply other algorithms requiring discrete inputs to audio data\":",
      "suffix": "",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_quote",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "B1xXOITI9H",
      "sentence_index": 6,
      "text": "can  you elaborate?",
      "suffix": "",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_explanation",
      "aspect": "asp_soundness-correctness",
      "polarity": "none"
    }
  ],
  "rebuttal_sentences": [
    {
      "review_id": "B1xXOITI9H",
      "rebuttal_id": "BJlsGxtnsS",
      "sentence_index": 0,
      "text": "Thank you for your fruitful comments.",
      "suffix": "\n\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_social",
      "alignment": [
        "context_global",
        null
      ],
      "details": {}
    },
    {
      "review_id": "B1xXOITI9H",
      "rebuttal_id": "BJlsGxtnsS",
      "sentence_index": 1,
      "text": ">>",
      "suffix": "",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_sentences",
        [
          1
        ]
      ],
      "details": {}
    },
    {
      "review_id": "B1xXOITI9H",
      "rebuttal_id": "BJlsGxtnsS",
      "sentence_index": 2,
      "text": "What would make it stronger imo is to address the issue of how much is gained from a discrete vs. continuous representation.",
      "suffix": "\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_sentences",
        [
          1
        ]
      ],
      "details": {}
    },
    {
      "review_id": "B1xXOITI9H",
      "rebuttal_id": "BJlsGxtnsS",
      "sentence_index": 3,
      "text": "Discrete representations by themselves are not better than continuous ones (cf. Table 1, wav2vec vs. vq-wav2vec).",
      "suffix": "",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_refute-question",
      "alignment": [
        "context_sentences",
        [
          1
        ]
      ],
      "details": {}
    },
    {
      "review_id": "B1xXOITI9H",
      "rebuttal_id": "BJlsGxtnsS",
      "sentence_index": 4,
      "text": "However, discretization enables the application of existing algorithms from the NLP literature which were designed for discrete inputs.",
      "suffix": "",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_refute-question",
      "alignment": [
        "context_sentences",
        [
          1
        ]
      ],
      "details": {}
    },
    {
      "review_id": "B1xXOITI9H",
      "rebuttal_id": "BJlsGxtnsS",
      "sentence_index": 5,
      "text": "We show that the BERT model can be directly applied to discretized speech.",
      "suffix": "",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_refute-question",
      "alignment": [
        "context_sentences",
        [
          1
        ]
      ],
      "details": {}
    },
    {
      "review_id": "B1xXOITI9H",
      "rebuttal_id": "BJlsGxtnsS",
      "sentence_index": 6,
      "text": "BERT can better model context than (vq-)wav2vec.",
      "suffix": "\n\n",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_refute-question",
      "alignment": [
        "context_sentences",
        [
          1
        ]
      ],
      "details": {}
    },
    {
      "review_id": "B1xXOITI9H",
      "rebuttal_id": "BJlsGxtnsS",
      "sentence_index": 7,
      "text": ">> The authors take it as a given that discrete is good because it allows us to leverage work in NLP. That makes sense -- but at what cost?",
      "suffix": "\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_sentences",
        [
          2,
          3
        ]
      ],
      "details": {}
    },
    {
      "review_id": "B1xXOITI9H",
      "rebuttal_id": "BJlsGxtnsS",
      "sentence_index": 8,
      "text": "Chaining vq-wav2vec and BERT requires more computational effort than just wav2vec, however, it does improve accuracy as our results show (cf. Table 1).",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          2,
          3
        ]
      ],
      "details": {}
    },
    {
      "review_id": "B1xXOITI9H",
      "rebuttal_id": "BJlsGxtnsS",
      "sentence_index": 9,
      "text": "Running BERT requires roughly as much computational overhead as just vq-wav2vec.",
      "suffix": "\n\n",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          2,
          3
        ]
      ],
      "details": {}
    },
    {
      "review_id": "B1xXOITI9H",
      "rebuttal_id": "BJlsGxtnsS",
      "sentence_index": 10,
      "text": ">> The state of the art on LibriSpeech is not Mohamed at al. 2019.",
      "suffix": "",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_sentences",
        [
          4
        ]
      ],
      "details": {}
    },
    {
      "review_id": "B1xXOITI9H",
      "rebuttal_id": "BJlsGxtnsS",
      "sentence_index": 11,
      "text": "See e.g. Irie et al. Interspeech 2019 for better result.",
      "suffix": "\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_sentences",
        [
          4
        ]
      ],
      "details": {}
    },
    {
      "review_id": "B1xXOITI9H",
      "rebuttal_id": "BJlsGxtnsS",
      "sentence_index": 12,
      "text": "Thanks for pointing this out, we fixed this in the updated version of the paper we just posted.",
      "suffix": "\n\n",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_done",
      "alignment": [
        "context_sentences",
        [
          4
        ]
      ],
      "details": {
        "request_out_of_scope": true
      }
    },
    {
      "review_id": "B1xXOITI9H",
      "rebuttal_id": "BJlsGxtnsS",
      "sentence_index": 13,
      "text": ">> The Conclusion is very sparse.",
      "suffix": "\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_sentences",
        [
          5
        ]
      ],
      "details": {}
    },
    {
      "review_id": "B1xXOITI9H",
      "rebuttal_id": "BJlsGxtnsS",
      "sentence_index": 14,
      "text": "We broadened conclusion and delineated additional future work.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_done",
      "alignment": [
        "context_sentences",
        [
          5
        ]
      ],
      "details": {
        "request_out_of_scope": true
      }
    }
  ]
}