{
  "metadata": {
    "forum_id": "rklklCVYvB",
    "review_id": "SyeFixSscr",
    "rebuttal_id": "H1gEspGqjH",
    "title": "Time2Vec: Learning a Vector Representation of Time",
    "reviewer": "AnonReviewer5",
    "rating": 3,
    "conference": "ICLR2020",
    "permalink": "https://openreview.net/forum?id=rklklCVYvB&noteId=H1gEspGqjH",
    "annotator": "anno3"
  },
  "review_sentences": [
    {
      "review_id": "SyeFixSscr",
      "sentence_index": 0,
      "text": "This paper introduces a particular learnable vector representation of time which is applicable across problems without the use of a hand-crafted time representation.",
      "suffix": "",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "SyeFixSscr",
      "sentence_index": 1,
      "text": "Their representation makes use of a feed-forward layer with sine activations which operates on time data.",
      "suffix": "",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "SyeFixSscr",
      "sentence_index": 2,
      "text": "As it is a vector representation, it combines well with other deep neural network methods.",
      "suffix": "",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "SyeFixSscr",
      "sentence_index": 3,
      "text": "They motivate their problem well, explaining why time data is important to a variety of problems and situate their solution as an orthogonal approach to many current solutions in the literature.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_motivation-impact",
      "polarity": "pol_positive"
    },
    {
      "review_id": "SyeFixSscr",
      "sentence_index": 4,
      "text": "They make reference to fourier analysis as motivation for their representation.",
      "suffix": "",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "SyeFixSscr",
      "sentence_index": 5,
      "text": "Finally, they provide experimental results to support their claims using fabricated and real-world time series datasets, as well as ablation studies to support their design decisions.",
      "suffix": "\n\n",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "SyeFixSscr",
      "sentence_index": 6,
      "text": "While I think this work has the potential to be a significant contribution, I rate this a weak reject because the theoretical motivation and analysis of the experimental results are lacking the depth of evidence I would expect for an ICLR paper.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_motivation-impact",
      "polarity": "pol_positive"
    },
    {
      "review_id": "SyeFixSscr",
      "sentence_index": 7,
      "text": "If you provide a deeper discussion of the provable claims about the power of your model via Fourier analysis and provide a table of test accuracy/recall@K with/without your representation for more than one other state of the art algorithm for these datasets, I would be convinced to strong accept.",
      "suffix": "\n\n",
      "review_action": "arg_social",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "SyeFixSscr",
      "sentence_index": 8,
      "text": "Specific comments:",
      "suffix": "\n\n",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_heading",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "SyeFixSscr",
      "sentence_index": 9,
      "text": "* p.3 third paragraph: you repeat yourself in math notation a few times here.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_soundness-correctness",
      "polarity": "pol_negative"
    },
    {
      "review_id": "SyeFixSscr",
      "sentence_index": 10,
      "text": "Repeated equations usually indicate that there is something new happening, but all of these are just restatements of your theta sin(omega tau + phi) term.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_substance",
      "polarity": "pol_negative"
    },
    {
      "review_id": "SyeFixSscr",
      "sentence_index": 11,
      "text": "I would introduce the notation for t2v(tau) upfront and use that to define a(tau, k)[j] and f_j",
      "suffix": "\n",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_edit",
      "aspect": "asp_substance",
      "polarity": "pol_negative"
    },
    {
      "review_id": "SyeFixSscr",
      "sentence_index": 12,
      "text": "* p.3",
      "suffix": "",
      "review_action": "none",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "SyeFixSscr",
      "sentence_index": 13,
      "text": "A clearer explanation of the theory here would help, as I think Fourier's theorem nicely supports your claims.",
      "suffix": "\n",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_clarity",
      "polarity": "pol_negative"
    },
    {
      "review_id": "SyeFixSscr",
      "sentence_index": 14,
      "text": "* p.4 first paragraph you claim that this method responds well to data which exhibits seasonality, but none of your datasets deal with data that would exhibit seasonality.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_substance",
      "polarity": "pol_negative"
    },
    {
      "review_id": "SyeFixSscr",
      "sentence_index": 15,
      "text": "There are plenty of simple real-world datasets available which show multi-scale periodic phenomena (activity or location data, weather data, travel data, etc.).",
      "suffix": "",
      "review_action": "arg_fact",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "SyeFixSscr",
      "sentence_index": 16,
      "text": "In fact, segmentation and recognition of wearable device activity would be a great application for this method.",
      "suffix": "\n",
      "review_action": "arg_fact",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "SyeFixSscr",
      "sentence_index": 17,
      "text": "* p.4",
      "suffix": "",
      "review_action": "none",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "SyeFixSscr",
      "sentence_index": 18,
      "text": "third paragraph: Your claim of invariance to time rescaling is technically correct, but I am not convinced that a model can learn the correct omega values for an arbitrary rescaling (e.g. if the period is smaller than the time unit).",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_soundness-correctness",
      "polarity": "pol_negative"
    },
    {
      "review_id": "SyeFixSscr",
      "sentence_index": 19,
      "text": "You show that this works for a rescaling from 2pi/7 to 2pi/14, but it would be nice if there was experimental confirmation of this property with frequency > 1.",
      "suffix": "\n",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_result",
      "aspect": "asp_substance",
      "polarity": "none"
    },
    {
      "review_id": "SyeFixSscr",
      "sentence_index": 20,
      "text": "* p.6 Showing accuracy/recall across training epochs is not sufficient evidence to show that this is a useful representation.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_substance",
      "polarity": "pol_negative"
    },
    {
      "review_id": "SyeFixSscr",
      "sentence_index": 21,
      "text": "There should be some kind of comparison with test set results from other state-of-the-art work on these datasets.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_meaningful-comparison",
      "polarity": "pol_negative"
    },
    {
      "review_id": "SyeFixSscr",
      "sentence_index": 22,
      "text": "If adding your representation to the SOTA model improved test set performance (or at least sped up training without hurting test set performance), then that would be better evidence.",
      "suffix": "",
      "review_action": "none",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "SyeFixSscr",
      "sentence_index": 23,
      "text": "If LSTM+T is the SOTA, say so and restate the author's test performance",
      "suffix": "",
      "review_action": "none",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "SyeFixSscr",
      "sentence_index": 24,
      "text": "compared to yours",
      "suffix": "",
      "review_action": "none",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "SyeFixSscr",
      "sentence_index": 25,
      "text": ".",
      "suffix": "",
      "review_action": "none",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "SyeFixSscr",
      "sentence_index": 26,
      "text": "If this is what these graphs show, consider using a different visualization to make it clearer that you're improving the final performance, not just the training process.",
      "suffix": "\n",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_clarity",
      "polarity": "pol_negative"
    },
    {
      "review_id": "SyeFixSscr",
      "sentence_index": 27,
      "text": "* p.8 I think sine functions make optimization harder because they make the gradient function periodic with respect to the weights, creating infinitely many local extrema.",
      "suffix": "",
      "review_action": "arg_fact",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "SyeFixSscr",
      "sentence_index": 28,
      "text": "Historically this may have been an issue, but deep neural networks have so many local minima it might not matter.",
      "suffix": "",
      "review_action": "arg_fact",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "SyeFixSscr",
      "sentence_index": 29,
      "text": "Still, it would be good to show that trained performance doesn't depend on the initialization values more than a standard LSTM+T model.",
      "suffix": "\n",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_experiment",
      "aspect": "asp_substance",
      "polarity": "pol_negative"
    },
    {
      "review_id": "SyeFixSscr",
      "sentence_index": 30,
      "text": "* You have an interesting corner case where your neural network parameters are interpretable: you can interpret the omega values from your model as frequencies and investigate their values to see which kinds of periodicity your model uses. You do something like this on p.7, but it would be neat to see a histogram like the one you have for EventMNIST for one of the real-world datasets to see if it learns the domain-relevant time knowledge you claim that it should learn.",
      "suffix": "",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_experiment",
      "aspect": "asp_substance",
      "polarity": "none"
    }
  ],
  "rebuttal_sentences": [
    {
      "review_id": "SyeFixSscr",
      "rebuttal_id": "H1gEspGqjH",
      "sentence_index": 0,
      "text": "We would like to thank the reviewer for constructive feedback.",
      "suffix": "\n\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_social",
      "alignment": [
        "context_global",
        null
      ],
      "details": {}
    },
    {
      "review_id": "SyeFixSscr",
      "rebuttal_id": "H1gEspGqjH",
      "sentence_index": 1,
      "text": "Results: We would like to clarify that all the results reported in the paper are on test sets (this includes Figures 1, 2, 3, and 5 as well as those in the supplementary).",
      "suffix": "",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_mitigate-criticism",
      "alignment": [
        "context_sentences",
        [
          6
        ]
      ],
      "details": {}
    },
    {
      "review_id": "SyeFixSscr",
      "rebuttal_id": "H1gEspGqjH",
      "sentence_index": 2,
      "text": "We decided to report the test set performance for all epochs instead of just the last epoch to show that: 1- in many cases, LSTM+Time2Vec consistently outperforms LSTM+T, 2- replacing the notion of time with Time2Vec does not deteriorate the performance, and 3- adding Time2Vec makes the model reach its best performance faster.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          6
        ]
      ],
      "details": {}
    },
    {
      "review_id": "SyeFixSscr",
      "rebuttal_id": "H1gEspGqjH",
      "sentence_index": 3,
      "text": "Sorry about the confusion, we will clarify this in the paper.",
      "suffix": "\n",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_by-cr",
      "alignment": [
        "context_sentences",
        [
          6
        ]
      ],
      "details": {
        "manuscript_change": true
      }
    },
    {
      "review_id": "SyeFixSscr",
      "rebuttal_id": "H1gEspGqjH",
      "sentence_index": 4,
      "text": "\u201cIf adding your representation to the SOTA model improved test set performance (or at least sped up training without hurting test set performance), then that would be better evidence.\u201d ->  This is indeed what we did.",
      "suffix": "",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_sentences",
        [
          22
        ]
      ],
      "details": {}
    },
    {
      "review_id": "SyeFixSscr",
      "rebuttal_id": "H1gEspGqjH",
      "sentence_index": 5,
      "text": "We showed that adding Time2Vec to LSTM+T (the model used in several recent works - see the last paragraph of related works section) and to two variants of TimeLSTM (a recent architecture with remarkable results on asynchronous sequential datasets) improves test set performance.",
      "suffix": "\n",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          22
        ]
      ],
      "details": {}
    },
    {
      "review_id": "SyeFixSscr",
      "rebuttal_id": "H1gEspGqjH",
      "sentence_index": 6,
      "text": "\u201ctest accuracy/recall@K with/without your representation for more than one other state of the art algorithm for these datasets\u201d: Upon the reviewer\u2019s request, we are looking to extend one more architecture with Time2Vec. If we managed to obtain results until the end of the rebuttal period, we will post them here.",
      "suffix": "\n",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          20,
          21,
          22
        ]
      ],
      "details": {}
    },
    {
      "review_id": "SyeFixSscr",
      "rebuttal_id": "H1gEspGqjH",
      "sentence_index": 7,
      "text": "Dataset that exhibits seasonality: The hand-crafted dataset has been created to serve that purpose (we could change the frequency from weekly to monthly or quarterly).",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          14,
          15,
          16
        ]
      ],
      "details": {}
    },
    {
      "review_id": "SyeFixSscr",
      "rebuttal_id": "H1gEspGqjH",
      "sentence_index": 8,
      "text": "The reason for using a hand-crafted dataset was because we could control the underlying dynamics and verify if the model can learn the correct dynamics.",
      "suffix": "\n\n",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          14,
          15,
          16
        ]
      ],
      "details": {}
    },
    {
      "review_id": "SyeFixSscr",
      "rebuttal_id": "H1gEspGqjH",
      "sentence_index": 9,
      "text": "Optimization of sine functions: The results we have reported in the paper demonstrate mean and standard deviation across multiple runs.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          27,
          28,
          29
        ]
      ],
      "details": {}
    },
    {
      "review_id": "SyeFixSscr",
      "rebuttal_id": "H1gEspGqjH",
      "sentence_index": 10,
      "text": "In each run, we initialize the parameters randomly.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          27,
          28,
          29
        ]
      ],
      "details": {}
    },
    {
      "review_id": "SyeFixSscr",
      "rebuttal_id": "H1gEspGqjH",
      "sentence_index": 11,
      "text": "The standard deviations provide evidence that the performance of LSTM+Time2Vec doesn't depend on the initialization values more than a standard LSTM+T model.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          27,
          28,
          29
        ]
      ],
      "details": {}
    },
    {
      "review_id": "SyeFixSscr",
      "rebuttal_id": "H1gEspGqjH",
      "sentence_index": 12,
      "text": "Moreover, from Fig 1(b) and 1(c), it can be observed that the standard deviation of LSTM+Time2Vec is even smaller than that of LSTM+T.",
      "suffix": "\n\n",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          27,
          28,
          29
        ]
      ],
      "details": {}
    },
    {
      "review_id": "SyeFixSscr",
      "rebuttal_id": "H1gEspGqjH",
      "sentence_index": 13,
      "text": "Theory: According to Fourier sine-cosine series, any real-valued function f(t) that is integrable on an interval of length P can be approximated as f(t) = a_0 + sum_{n=1}^{N/2} (a_n cos(2nt\\pi/P) + b_n sin(2nt\\pi/P)) by choosing appropriate weights a_n and b_n.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          27,
          28,
          29
        ]
      ],
      "details": {}
    },
    {
      "review_id": "SyeFixSscr",
      "rebuttal_id": "H1gEspGqjH",
      "sentence_index": 14,
      "text": "Since cos(x)=sin(x+\\pi/2), the cos functions can be replaced with sine functions so f(t) can be approximated with N sine functions.",
      "suffix": "\n",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          27,
          28,
          29
        ]
      ],
      "details": {}
    },
    {
      "review_id": "SyeFixSscr",
      "rebuttal_id": "H1gEspGqjH",
      "sentence_index": 15,
      "text": "By concatenating Time2Vec to the input, as explained in the second paragraph of Section 4, we allow the sequence model to learn a function (or multiple functions) of time based on the data by taking a weighted sum (the weights correspond to a_n and b_n in the formula above) of the sinusoids.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          27,
          28,
          29
        ]
      ],
      "details": {}
    },
    {
      "review_id": "SyeFixSscr",
      "rebuttal_id": "H1gEspGqjH",
      "sentence_index": 16,
      "text": "Learning a function of time from data rather than fixing it to a hand-crafted function can potentially lead to better generalization.",
      "suffix": "\n",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          27,
          28,
          29
        ]
      ],
      "details": {}
    },
    {
      "review_id": "SyeFixSscr",
      "rebuttal_id": "H1gEspGqjH",
      "sentence_index": 17,
      "text": "We will state the theory behind Time2Vec more explicitly.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_by-cr",
      "alignment": [
        "context_sentences",
        [
          27,
          28,
          29
        ]
      ],
      "details": {
        "manuscript_change": true
      }
    }
  ]
}