Towards Diverse Perspective Learning with Switch over Multiple Temporal Pooling

Withdrawn Submission by ConferenceTowards Diverse Perspective Learning with Switch over Multiple Temporal Pooling

22 Sept 2022 (modified: 13 Feb 2023)ICLR 2023 Conference Withdrawn SubmissionReaders: EveryoneShow BibtexShow Revisions
Keywords: timeseries classification, temporal pooling, temporal relationship, perspective learning
Abstract: Pooling is a widely used method for classification problems. In particular, poolings that consider temporal relationships have been proposed in the time series classification (TSC) domain. However, we found that there exists a data dependency on temporal poolings. Since each pooling has only one perspective, existing temporal poolings cannot solve data dependency problem with a fixed perspective learning. In this paper, we propose a novel pooling architecture for diverse perspective learning: switch over multiple pooling (SoM-TP). The massive case study using layer-wise relevance propagation (LRP) reveals the distinct view that each pooling has and ultimately emphasizes the necessity of diverse perspective learning. Therefore, SoM-TP dynamically selects temporal poolings according to time series data characteristics. The ablation study on SoM-TP shows how diverse perspective learning is achieved. Furthermore, pooling classification is investigated through input attribution by LRP. Extensive experiments are done with the UCR/UEA repository.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Deep Learning and representational learning
Supplementary Material:  zip

Reply Type:
Author:
Visible To:
Hidden From:
26 Replies
[–][+]

Submission Withdrawn by the Authors

Withdraw by Paper5971 AuthorsSubmission Withdrawn by the Authors

ICLR 2023 Conference Paper5971 Authors
18 Jan 2023, 13:16ICLR 2023 Conference Paper5971 WithdrawReaders: EveryoneShow Revisions
Withdrawal Confirmation: I have read and agree with the venue's withdrawal policy on behalf of myself and my co-authors.
[–][+]

Final Summarization of Discussion

Official Comment by Paper5971 AuthorsFinal Summarization of Discussion

ICLR 2023 Conference Paper5971 Authors
11 Dec 2022, 13:01 (modified: 12 Dec 2022, 13:32)ICLR 2023 Conference Paper5971 Official CommentReaders: EveryoneShow Revisions
Comment:

Dear all reviewers,

Here is what we discussed,

  1. An additional evaluation metric with PRAUC
  2. The complexity discussion of SoM-TP with big-O.
  3. An additional ablation study on SoM-TP with a perspective loss with histogram.
    1. DPLN pulls the classification network’s output while finding the optimal classification result.
  4. The extended experiment on a large real-world dataset and NLP.

and additional ablation study on SoM-TP with SVM and kernel,

  1. With pooled vectors(before fully-connected layers), we draw the decision boundary of SVM with various kernels and found that SoM-TP’s decision boundary classifies the features more than other temporal poolings.

As the discussion period is ending nearest, we sincerely thank all reviewers for their valuable comments. It was a great honor to have discussions with all reviewers.

Thank you.

[–][+]

We are looking forward to comments of the post-rebuttal and discussion

Official Comment by Paper5971 AuthorsWe are looking forward to comments of the post-rebuttal and discussion

ICLR 2023 Conference Paper5971 Authors
12 Dec 2022, 13:31ICLR 2023 Conference Paper5971 Official CommentReaders: EveryoneShow Revisions
Comment:

Dear all reviewers,

The discussion period closes soon, so we want to hear the reviewer's feedback on the updated manuscript and our discussion. As we didn't get comments from reviewers after the discussion period, we are unsure whether we adequately address concerns. We would be happy to get replies from all reviewers.

Thank you.

[–][+]

Gentle Reminder of Discussion

Official Comment by Paper5971 AuthorsGentle Reminder of Discussion

ICLR 2023 Conference Paper5971 Authors
08 Dec 2022, 11:06 (modified: 12 Dec 2022, 13:33)ICLR 2023 Conference Paper5971 Official CommentReaders: EveryoneShow Revisions
Comment:

Dear reviewers,

The discussion period is ending this week, so we look forward to more discussion. We would be happy to have further discussion with an updated manuscript.

  • Additionally, we acknowledge that our first manuscript was in short supply. So, we tried hard to reflect the feedback of all reviewers during the rebuttal period. We would like all reviewers to know that we are just doing our best all the time; when we first submitted the manuscript, during the rebuttal period, and now in the discussion period.

We really appreciate all reviewers' time, effort, and valuable comments.

Thank you.

[–][+]

Summarization of Discussion and Extended Experiments (1/)

Official Comment by Paper5971 AuthorsSummarization of Discussion and Extended Experiments (1/)

ICLR 2023 Conference Paper5971 Authors
23 Nov 2022, 10:36ICLR 2023 Conference Paper5971 Official CommentReaders: EveryoneShow Revisions
Comment:

We thank again the reviewers for their thoughtful effort in their review and discussion. And here is the summary of the “Discussion” and “Extended Experiment”.

  • First, as for the “Discussion”, we discussed the complexity of SoM-TP compared to other existing temporal poolings (with reviewer 2KQe).

    For complexity in the concentrate on each pooling layer and whole model optimization are as follows,

    • GTP: O(1) for pooling complexity, and O(N) for training optimization complexity.
    • STP: O(L) for pooling complexity, and O(N) for training optimization complexity.
    • DTP: O(2LDTW)=O(L) for calculating soft-DTW in forward and backward by GPU, while O(LDTW3)=O(L3) in increment in complexity by CPU for pooling complexity, and O(N+wdtp)=O(N) for training optimization complexity with additional optimization for soft-DTW layer.
    • SoM-TP: O(Lconv+2LQLK)=O(L2) for pooling complexity to compute attention score to dynamic pooling selection, (Q,K for query and key, respectively). And O(N+wdtp+wA+nDPLN)=O(N) for training optimization complexity with additional optimization for attention weight A and DPLN.
    • In the inference process, optimization complexity becomes all O(1) and only pooling complexity remains.
    • Here, we considered pooling complexity except for calculating the maximum or average value which has in common.

    There are trade-offs between complexity and performance because increasing complexity and capacity generally make better performance. And SoM-TP which has the highest complexity and capacity achieved the best performance on TSC. To this end, the increment in complexity is needed to reflect various poolings in the context of perspective. Therefore, SoM-TP has a little degradation in complexity but has also achieved robust performance.

    • As for the notation, w is the weight of the parameter and n is the weight of the layers. We are planning to add this discussion in section 3.2.2.

  • Second, as for the “Extended Experiment”, we have done an additional ablation study on SpM-TP with perspective loss. We draw a histogram with several epochs of training which indicate the distribution in comparison with classification network output (ypred1) and DPLN output (ypred2). This is for all reviewers’ comments on the solidity and novelty of SoM-TP.

    • As for perspective loss,

      Lperspective=KL(ypred1,ypred2)+LDPLN

      with LDPLN is cross-entropy loss with ypred2. The reason why LDPLN is included in Lperspective is that the sub-network should be trained to make the right classification.

    • We found that ypred2 distribution pulls ypred1 while finding optimal classification results as itself. Therefore, ypred1 are tied to ypred2’s distribution and indirectly reflected by classification results from all pooling output vectors. And here, the updated distribution with every epoch shows that perspective loss works rightly for diverse perspective learning.

    • We are planning to add this experiment to Appendix A.

We would again like to thank all reviewers for their time and feedback, and we hope that our discussion adequately addresses all concerns. Please let us know if you have any further questions, and we are very happy to follow up!

[–][+]

Summarization of Discussion and Extended Experiments (2/)

Official Comment by Paper5971 AuthorsSummarization of Discussion and Extended Experiments (2/)

ICLR 2023 Conference Paper5971 Authors
26 Nov 2022, 08:27ICLR 2023 Conference Paper5971 Official CommentReaders: EveryoneShow Revisions
Comment:
  • Third, as for the “Extended Experiment”, we have done additional experiments on real-world larger datasets in time series and NLP (for the comment from reviewer QzyE).
    • Larger datasets with ECG and EEG datasets, and the performance of accuracy is as follows,

      Dataset Type Data-size Class GTP-MAX STP-MAX DTP-MAX-euc SoM-TP-MAX
      ECG uni-variate 109,446 5 0.9236 0.9421 0.8593 0.9579
      EEG multi-variate (65) 122,880 2 0.9323 0.9162 0.9087 0.9485

      that SoM-TP performs best.

    • Text classification (NLP) with snli1.0 datasets, and the performance of accuracy is as follows,

      Dataset Type Data-size Class LSTM LSTM-Attention Word-by-Word Attention GTP-MAX STP-MAX SoM-TP-MAX
      snli1.0 text classification 559,209 3 0.3387 0.7593 0.7784 0.7899 0.7843 0.7948
      • Especially, in the NLP task, we found valuable characteristics compared to DTP.
      • DTP has the problematic condition for layer-by-layer pooling between convolutional or RNN layers because it has to optimize the “soft-DTW parameter” based on the previously hidden feature.
      • This leads to the optimization complexity in proportion to the layer number of DTP, which is the same as the soft-DTW layer.
      • However, SoM-TP is a framework of diverse perspective learning, it is possible to be composed between other layers. In this task, we constructed SoM-TP only with GTP and STP between three RNN layers. And SoM-TP outperforms other temporal poolings and also with attention-based models.
      • We referenced the model [1] (with embedding dimension 128, hidden dimension 256, batch 32, and learning rate 0.005). As shown in the table above, simple NLP model architecture for text classification such as LSTM or attention is specially designed for comparison with LSTM-pooling architecture.
      • SoM-TP is also useful in hierarchical model architecture and this is another novelty comparison to DTP.
      • We are planning to add this experiment to Appendix A.

[1] Alexis Conneau, Douwe Kiela, Holger Schwenk, Loic Barrault, and Antoine Bordes. Supervised learning of universal sentence representations from natural language inference data, 2017. URL https://arxiv.org/abs/1705.02364.

We would again like to thank all reviewers for their time and feedback, and we hope that our discussion adequately addresses all concerns. Please let us know if you have any further questions, and we are very happy to follow up!

[–][+]

Summarization of Discussion and Extended Experiments (3/)

Official Comment by Paper5971 AuthorsSummarization of Discussion and Extended Experiments (3/)

ICLR 2023 Conference Paper5971 Authors
05 Dec 2022, 08:06ICLR 2023 Conference Paper5971 Official CommentReaders: EveryoneShow Revisions
Comment:

During discussion week, we responded to the model complexity of SoM-TP. And here is an additional and detailed response to SoM-TP's complexity.

  • The increase in capacity and complexity of the model sometimes causes overfitting and degradation of performance.
  • The complexity increase in SoM-TP is caused by the attention mechanism to automatically and dynamically select from distinct poolings. In detail, the complexity of the pooling method itself does not increase.
    • Pooling complexity depends on which poolings are selected by attention (e.g. O(1) for GTP and O(L) for STP and DTP). And complexity for calculating attention score O(L2) is added for SoM-TP.
  • In this context, SoM-TP does not change or increase pooling complexity itself but increases in complexity according to the attention framework. This is why SoM-TP robustly perform on various TSC dataset than other temporal poolings, while complexity increase. And this is another point with a simple ensemble network. We think that this is another novelty for SoM-TP.

We would again like to thank all reviewers for their time and feedback, and the discussion period is up to the end, please let us know if you have any further questions; we are happy to follow up!

[–][+]

Additional General Response to All Reviewers (1/2)

Official Comment by Paper5971 AuthorsAdditional General Response to All Reviewers (1/2)

ICLR 2023 Conference Paper5971 Authors
19 Nov 2022, 10:47ICLR 2023 Conference Paper5971 Official CommentReaders: EveryoneShow Revisions
Comment:

To all reviewers,

We thank again the reviewers for their review. In response to feedback, we provide an additional general response to “novelty” and an updated manuscript.

  • First, we updated the manuscript to effectively deliver SoM-TP’s purpose and goal to diverse perspective learning with detailed explanations and captions with the newly designed figures. Also, the Appendix section is added for additional experiments and results to support SoM-TP for effective comprehension. And here is the summarization of the updated manuscript as follows,
    • Section 1 “Introduction” is updated with the definition of “perspective” of pooling, followed by the meaning of “fixed-perspective learning” and “diverse-perspective learning”.
    • Section 2 “Background” is updated to explain the distinct perspective of existing temporal poolings and also with detailed examples of LRP.
    • Section 3 “SoM-TP: Towards Diverse Perspective Learning” is updated with a detailed notation of the formula. We specifically tried to explain the role of DPLN and perspective loss.
    • Also in section 3 with performance results, the analysis is updated with a quantitative way of tabling average performance and a qualitative way of LRP in Figure 2. Furthermore, we tried to objectively analyze SoM-TP’s robustness with other evaluation metrics, such as histogram and rank bar chart.
    • Table 2 is updated with the average performance of the UCR/UEA repository with various cases of temporal poolings, compared to SoM-TP. Also, ResNet architecture is added.
    • Figure 1 is updated with a detailed notation of vectors and matrixes, and the two fully-connected layers are detailly defined. Also the detailed caption with the flow of the SoM-TP process.
    • Figure 2 is updated by adding the accuracy performance and color to the figure. We circled to where each pooling is focusing on, and samely provide performance below. With detailed explanations in the caption with example-based analysis, we can get through why SoM-TP’s diverse perspective is necessary.
    • Figure 3 is updated with a detailed explanation in the caption. We tried to explain how SoM-TP finds optimal attention scores to reach diverse perspective learning.
    • Figure 4 is added, which is to show an objective analysis of SoM-TP. The histogram and bar chart is selected to see SoM-TP’s outperformance.
  • Second for the updated Appendix as follows,
    • In A.1. a detailed experimental setting, specifically for the convolutional stack is explained in the figure, containing embedding dimensions and overall architecture.
    • In A.2. an ablation study in λ is added. We designed the experiment to show how perspective loss with λ decay affects the SoM-TP in performance. And we found optimal λ for the datasets.
    • In A.3. an extended analysis of SoM-TP with an individual dataset with varies time length, data size, and dimension, to see if SoM-TP robustly works in any characteristic of the dataset. And figure shows that SoM-TP dynamically selects each data with appropriate pooling in any dataset characteristics.
    • In B. the DTP algorithm is introduced.
    • In C. the extended related work is introduced.
[–][+]

Additional General Response to All Reviewers (2/2)

Official Comment by Paper5971 AuthorsAdditional General Response to All Reviewers (2/2)

ICLR 2023 Conference Paper5971 Authors
19 Nov 2022, 10:48 (modified: 19 Nov 2022, 10:54)ICLR 2023 Conference Paper5971 Official CommentReaders: EveryoneShow Revisions
Comment:
  • Finally, for the novelty, we would like to say three things carefully as follows,
    • There was a rare in the study of time series data and pooling with XAI methods, especially with LRP input attribution. And with LRP, we found that temporal pooling, which is known as SOTA for TSC, has different and fixed perspectives for each.
    • In the same context of LRP and input attribution, there was no attempt to detailly analyze the perspective of existing pooling. Therefore, the deep analysis of existing temporal pooling leads us to figure out “diverse perspective learning” to leverage pooling for TSC.
    • And to reach “diverse perspective learning”, it is true that SoM-TP uses existing temporal poolings, but the important that we want to say is the learning framework. With attention score and perspective loss, any pooling with a different perspective can be implemented to SoM-TP. This is the most advantage of SoM-TP.
  • We updated the code in supplementary.zip for reproducibility.

We would again like to thank all reviewers for their time and feedback, and we hope that our changes adequately address all concerns.

[–][+]

General Response to All Reviewers

Official Comment by Paper5971 AuthorsGeneral Response to All Reviewers

ICLR 2023 Conference Paper5971 Authors
18 Nov 2022, 15:10ICLR 2023 Conference Paper5971 Official CommentReaders: EveryoneShow Revisions
Comment:

To all reviewers,

We thank the reviewers for their thoughtful and constructive review. In response to feedback, we provide a general response to points raised by multiple reviewers and an updated manuscript.

  • Experiment Result and Performance: SoM-TP has a robust and highest performance overall.

  • We updated more clear experiment results through both quantitative analysis and qualitative analysis. For the quantitative analysis, three evaluation methods are used to compare SoM-TP to other pooling methods: 1) Average performance comparison with Table2, 2) histogram comparison between SoM-TP and DTP which is SOTA of temporal pooling, and 3) rank bar chart.

    • First, the Average performance of SoM-TP outperforms all the temporal poolings in Table 2 as mentioned in the second bullet point.
    • Second, we calculated histograms under-area to see how many datasets cases in which SoM-TP and DTP perform better than each other. and SoM-TP also outperformed DTP with a big gap in Figure 4.
    • Finally, the bar chart (Figure 4) shows SoM-TP has more robust results than other temporal poolings from the fact that SoM-TP has the most number of rank 1 and the least number of rank 4.
    • A detailed explanation of these three evaluations is in section 3.2.3.
  • The optimal λ is applied through the hyperparameter search, and SoM-TP outperforms all the other temporal pooling methods.

    • In Table 2, you can check that SoM-TP outperforms all the other temporal pooling methods (including DTP) based on the average performance of all datasets.
    • In detail, for the repository of 112 univariate datasets, SoM-TP MAX shows the best performance in FCN (acc: 0.7503 / f1macro: 0.7212) and ResNet (acc: 0.7690 / f1macro: 0.7398).
    • For the repository of 21 multivariate datasets, SoM-TP also shows better performance than the other pooling method in FCN (acc: 0.6969 / f1macro: 0.6648) and ResNet (acc: 0.6766 / f1macro: 0.6542).
    • If you would like to check more detailed results, please refer to Table 2, Figure 4, and Section 3.2.
  • For the qualitative analysis (Figure 2), we can check the LRP results of GTP, STP, DTP, and SoM-TP with a different perspective of each pooling.

    • In the example datasets (CricketZ, Fungi, and WordSynonyms) with input attribution results, we can identify how the diverse perspectives of SoM-TP focus differently with other temporal poolings.
  • Furthermore, we are planning to update the Appendix section with additional experiment explanations and results. (e.g. detailed explanation of experimental setting, λ ablation study, extended study on large dataset, and NLP).

We would again like to thank all reviewers for their time and feedback, and we hope that our changes adequately address all concerns.

[–][+]

Official Review of Paper5971 by Reviewer FNQF

Official Review of Paper5971 by Reviewer FNQF

ICLR 2023 Conference Paper5971 Reviewer FNQF
04 Nov 2022, 00:44ICLR 2023 Conference Paper5971 Official ReviewReaders: EveryoneShow Revisions
Summary Of The Paper:

The paper proposes a attention based method that can dynamically select data-specific temporal pooling method from (1) global temporal pooling (GTP), (2) static temporal pooling (STP), and (3) dynamic temporal pooling (DTP). Experiments can show the effectiveness of the proposed method and demonstrate the approach indeed selects different pooling methods for different batches.

Strength And Weaknesses:

Strengths:

  • The work is well-motivated given the sufficient analysis on the limitations of each existing temporal pooling method (Section 2.2).
  • Experiments can show how different pooling methods vary while changing different batches of data.

Weaknesses:

  • The novelty and solidity of the work are insufficient. The work is incremental on top of Lee, Dongha, Seonghyeon Lee, and Hwanjo Yu. "Learnable dynamic temporal pooling for time series classification." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 35. No. 9. 2021. given that it aims to select different pooling methods from the above mentioned work based on attention mechanism.
  • The proposed method SoM-TP does not perform better than DTP according to Table 2.
  • The paper need to be proofread more carefully. Many algorithms and figures are not referenced correctly (many ? reference throughout the paper).
  • Experimental results are not clearly explained.
Clarity, Quality, Novelty And Reproducibility:

Both clarity and novelty need to be improved:

  • Clarity: The motivation is clearly illustrated given the extensive analysis and comparisons in Section 2.2. on the constraints of existing temporal pooling methods. However, the clarity of experiments is not sufficient: (1) what's the embedding dimension? This also relates to the limited reproducibility; (2) which specific dataset (UCR or UEA) is used for a specific result (e.g., dataset for Table 2 is not mentioned). This also relates to the limited reproducibility.

  • Novelty: As explained above, the novelty of this work is incremental.

Summary Of The Review:

Despite some strengths of this paper (e.g., good motivations, sufficient motivating analysis), there are a few weaknesses that need to be addressed for acceptance: (1) novelty, clarity; (2) experimental effectiveness of the proposed method; (3) writing readiness.

Correctness: 3: Some of the paper’s claims have minor issues. A few statements are not well-supported, or require small changes to be made correct.
Technical Novelty And Significance: 3: The contributions are significant and somewhat new. Aspects of the contributions exist in prior work.
Empirical Novelty And Significance: 3: The contributions are significant and somewhat new. Aspects of the contributions exist in prior work.
Flag For Ethics Review: NO.
Recommendation: 5: marginally below the acceptance threshold
Confidence: 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked.
[–][+]

Responses to Reviewer FNQF

Official Comment by Paper5971 AuthorsResponses to Reviewer FNQF

ICLR 2023 Conference Paper5971 Authors
18 Nov 2022, 15:14 (modified: 18 Nov 2022, 15:19)ICLR 2023 Conference Paper5971 Official CommentReaders: EveryoneShow Revisions
Comment:

We thank Reviewer FNQF for the comments and helpful feedback on our work. We address many of Reviewer FNQF’s suggestions in the general response above (updated manuscript). And here, we respond to specific comments.

  • Q1. The novelty and solidity of the work are insufficient. The work is incremental on top of Lee, Dongha, Seonghyeon Lee, and Hwanjo Yu. "Learnable dynamic temporal pooling for time series classification." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 35. No. 9. 2021. given that it aims to select different pooling methods from the above-mentioned work based on the attention mechanism.
  • Q2. The proposed method SoM-TP does not perform better than DTP according to Table 2.
  • Q3. Experimental results are not clearly explained. And What experimental effectiveness of the proposed method?
    • For Q1, 2, and 3, please refer to the “General Response to All Reviewers” in Experiment Result and Performance. Also, Please refer to the updated manuscript.
  • Q4. Clarity: The motivation is clearly illustrated given the extensive analysis and comparisons in Section 2.2. on the constraints of existing temporal pooling methods. However, the clarity of experiments is not sufficient: (1) what's the embedding dimension? This also relates to the limited reproducibility; (2) which specific dataset (UCR or UEA) is used for a specific result (e.g., the dataset for Table 2 is not mentioned). This also relates to limited reproducibility.
    • Q4-1. What’s the embedding dimension?
      • We use a convolutional stack as a feature extractor in the experiments.
      • FCN with three convolutional layers: 128/256/256 dimensions in order.
      • ResNet with 9 convolutional layers and skip layers: 64/128/256 dimensions in order with three residual blocks.
      • Also, we clarified other components of the model architecture (e.g. number of convolutional layers, segmentation number, number of fully-connected layers, batch size, and window size) in Table 1.
      • In Table 2, the whole number of model parameters is specified.
      • We are planning to update the detailed model architecture for FCN and ResNet in the Appendix section.
    • Q4-2. Which specific dataset (UCR or UEA) is used for a specific result (e.g., the dataset for Table 2 is not mentioned)?
      • In Table 2, the average performances for all UCR and UEA repositories are updated with FCN and ResNet architectures.
      • For the qualitative analysis using the LRP method (Figure 2), we specified the datasets used in each experiment result.
  • Q5. The paper needs to be proofread more carefully. Many algorithms and figures are not referenced correctly (many ? references throughout the paper).
    • We revised all the typos of the wrong references.

We would again like to thank reviewer FNQF, and we hope that our changes adequately address all concerns. Please let us know if you have any further questions, and we are very happy to follow up!

[–][+]

Additional response to Reviewer FNQF

Official Comment by Paper5971 AuthorsAdditional response to Reviewer FNQF

ICLR 2023 Conference Paper5971 Authors
26 Nov 2022, 08:29ICLR 2023 Conference Paper5971 Official CommentReaders: EveryoneShow Revisions
Comment:

To Reviewer FNQF,

We added, “Summarization of Discussion and Extended Experiments” from all reviewers with their valuable comments and discussions and we summarized what we have done extended analysis and experiments based on feedback.

We really appreciate your comment, question, and feedback on our work. And we hope that our response adequately addresses all concerns. Please let us know if you have any further questions, and we are very happy to follow up!

[–][+]

Official Review of Paper5971 by Reviewer 2KQe

Official Review of Paper5971 by Reviewer 2KQe

ICLR 2023 Conference Paper5971 Reviewer 2KQe
25 Oct 2022, 11:49ICLR 2023 Conference Paper5971 Official ReviewReaders: EveryoneShow Revisions
Summary Of The Paper:

In this paper, a pooling architecture for diverse perspective learning is proposed. The architecture is referred to as switch over multiple pooling (SoM-TP). To motivate their method, the authors argue that one pooling cannot be dominant in various time series data characteristics and thus there is a need for a more robust pooling that encompasses multiple diverse perspectives. Experiments were conducted on UCR/UEA public datasets, the results of which demonstrate the effectiveness and robustness of SoM-TP in comparison with three other pooling methods.

Strength And Weaknesses:

Strengths:

  • Existing temporal poolings cannot capture dependencies that may exist within time series data through a fixed perspective learning since a single pooling method considers only one perspective, i.e. has only a single viewpoint. In contrast, SoM-TP is designed to address this problem by leveraging a diverse perspective learning (DPLN) sub-network in its training process that captures various (including both global and local) viewpoints.

  • SoM-TP is capable of identifying the characteristics of the data in a specific mini-batch and leverages attention weights to dynamically select suitable temporal poolings based on the identified data characteristics. In a downstream classification scenario, however, the output class is predicted considering all perspectives of pooling in SoM-TP.

  • The authors have compared SoM-TP with three pooling methods (GTP, STP and DTP) on 112 univariate and 21 multivariate time series datasets from the UCR/UEA repository stemming from various domains. The results (1) demonstrate that SoM-TP tends to be more robust to data that requires capturing different perspectives; and (2) suggest that SoM-TP is more accurate than fixed perspective learning with GTP and STP, while having similar accuracy to that of DTP.

  • The authors have conducted an ablation study that provides an insight into the inner-workings of the diverse perspective learning process.


Weaknesses:

  • The proposed SoM-TP method appears to simply combine several components including different pooling methods (GTP, STP and DTP) and an attention mechanism, which are all already well-established in the literature. Even the DPLN sub-network is a simple classification network that considers all pooling perspectives to make a final decision, which limits the methodological novelty of this work.

  • SoM-TP is designed to dynamically select suitable temporal poolings based on certain characteristics that were identified in a certain portion (mini-batch) of a dataset. Nevertheless, neither have the characteristics (identified in each of the considered datasets) been discussed in the paper; nor have the authors discussed how the descriptive statistics (such as size, dimensionality, time series length, etc.) of different datasets impact the dynamic selection of the temporal poolings. I would encourage the authors to discuss and provide more clarity on the aforementioned two points in their repose.

  • For one to select the most suitable pooling method for a certain mini-batch, naturally, all three considered pooling operations would need to be applied to the hidden features with temporal position information in each feed-forward step of the overall architecture. This increases the complexity of the training as well as the inference stage, however, the complexity of neighter is discussed by the authors. I would suggest that such a discussion is included in both the authors’ response as well as in the paper.

  • The tradeoff parameter (or decay) λ in Eq. (6) appears to have an important role as it controls the influence of the perspective loss. That being said, I am wondering why the authors have not analyzed the effect that different values of λ may have on the downstream time series classification performance.

  • Average accuracy is considered as a classification performance metric, however, to my knowledge some of the considered UCR/UEA datasets are rather imbalanced, thus the authors should have considered metrics that are less sensitive to class imbalance (e.g., area under the precision-recall curve (AUPRC)).


Minor weaknesses: There are also certain grammatical and typographical errors and remarks that require attention. Some of them are summarized as follows:

  • In the next-to-last paragraph of the Introduction section on page 2, “all learnable weight” should be replaced with “all learnable weights”.
  • In the “Static Temporal Pooling” paragraph on page 3, replace “segmentation” with “segments”.
  • In the “Dynamic Temporal Pooling” paragraph on page 3, there is a missing reference to an Algorithm and another one to an Appendix.
  • At the beginning of page 6, there is a reference missing right after “class weight matrix”.
  • In the caption of Table 1, “detail” should be replaced with “detailed”.
  • In the “Robustness” paragraph on page 7, “Figure 4 (a)” should be corrected to “Figure 5 (a)”. Later in the same sentence, “GTP” should be included between the words “than” and “as”.
  • The caption of Figure 5 should be more descriptive instead of simply being “Performance graph”.
Clarity, Quality, Novelty And Reproducibility:

Clarity: The paper is decently written, but there is certainly room for improvement when writing is concerned. However, in my view, the paper is not organized and presented well, while being difficult-to-follow, particularly when one reads the experimental section.

Quality: The design and justifications for the proposed SoM-TP method seem to be technically sound. The observations made regarding the robustness and accuracy of SoM-TP seem to hold in the considered settings, however, these observations need to be further supported by conducting additional ablation studies (mentioned in the “Weaknesses” part of this review). Overall, the paper has merit but needs more work as it does not appear to be fully developed in terms of quality.

Novelty: From a methodological perspective, the contribution of this work can be considered rather incremental. In essence, the authors leveraged several existing pooling methods and integrated them into a single architecture that learns attention weights for each of the poolings in other to select the most suitable pooling given a certain batch of data samples (please refer to the first point of the “Weaknesses” part of this review, where this is explained in more detail).

Reproducibility: The experiments were conducted on widely-used time series classification datasets which are publicly available. On the other hand, the code for the proposed SoM-TP method is not made available by the authors in the present anonymized version, however, with some effort one might be able to implement SoM-TP by following Section 3.

Summary Of The Review:

This work has merit but does not appear to be well developed. Although the authors provided interesting insights regarding the need for diverse perspective learning for pooling methods, the work, in its current state, lacks methodological novelty; certain key aspects (such as the impact of data size, dimensionality, time series length, etc., on the dynamic pooling selection) are not addressed in the paper; the role of the perspective loss (which can be considered central to this work) has not been analyzed; among other weak points outlined in the “Weaknesses” part of this review. Overall, these weak points of the paper seem to outweigh its strengths. Therefore, I am not convinced that this work is a good fit for ICLR. Nevertheless, I am looking forward to the authors’ response and I would be willing to adjust my score in case I have misunderstood or misinterpreted certain aspects of the work.

Correctness: 3: Some of the paper’s claims have minor issues. A few statements are not well-supported, or require small changes to be made correct.
Technical Novelty And Significance: 2: The contributions are only marginally significant or novel.
Empirical Novelty And Significance: 3: The contributions are significant and somewhat new. Aspects of the contributions exist in prior work.
Flag For Ethics Review: NO.
Details Of Ethics Concerns:

Not applicable.

Recommendation: 3: reject, not good enough
Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work.
[–][+]

Responses to Reviewer 2KQe (1/2)

Official Comment by Paper5971 AuthorsResponses to Reviewer 2KQe (1/2)

ICLR 2023 Conference Paper5971 Authors
18 Nov 2022, 15:29ICLR 2023 Conference Paper5971 Official CommentReaders: EveryoneShow Revisions
Comment:

We thank Reviewer 2KQe for the comments and thoughtful feedback on our work. We address many of Reviewer 2KQe’s suggestions in the general response above (updated manuscript). And here, we respond to specific comments.

  • Q1. 1)The proposed SoM-TP method appears to simply combine several components including different pooling methods (GTP, STP, and DTP) and an attention mechanism, which is all already well-established in the literature. 2)Even the DPLN sub-network is a simple classification network that considers all pooling perspectives to make a final decision, which limits the methodological novelty of this work.
    • 1, 2) We updated section 3.1. with a detailed explanation. In SoM-TP, DPLN is a regularizer network that considers all pooling perspectives. Thus, DPLN output, ypred2, is not used for the final classification decisions. It is used to calculate the KL divergence term as a term in the perspective loss, then this indirectly makes the model learn diverse perspectives. Here is the following process of diverse perspective learning in the usage of DPLN.
      • The final classification result ypred1 is from the classification network (main) which gets the input of selected pooling features p.

      • DPLN uses the multiplication of attention score A and pooling feature vectors P¯ as input, and makes classification decision ypred2. However, this result is not used for actual classification, but for calculating the KL divergence term in the perspective loss.

      • And the perspective loss is as follows,

        Lperspective=KL(ypred1,ypred2)+LDPLN,

        with LDPLN is cross-entropy loss with ypred2. The reason why LDPLN is included in Lperspective is that the sub-network should be trained to make the right classification.

      • The final cost function is as follows,

        Lcost=Lclassification+λLperspective,

        with Lclassification is cross-entropy loss with ypred1.

      • Therefore, DPLN itself is an ensemble network but works as a regularizer in SoM-TP. (also please refer to section 3.2.2 “What is the role of DPLN?”)

  • Q2. SoM-TP is designed to dynamically select suitable temporal poolings based on certain characteristics that were identified in a certain portion (mini-batch) of a dataset. Nevertheless, 1) neither have the characteristics (identified in each of the considered datasets) been discussed in the paper; 2) nor have the authors discussed how the descriptive statistics (such as size, dimensionality, time series length, etc.) of different datasets impact the dynamic selection of the temporal poolings. I would encourage the authors to discuss and provide more clarity on the aforementioned two points in their repose.
    • 1, 2) Datasets in UCR/UEA repository vary in data sizes, time lengths, and dimensions. From the fact that the SoM-TP shows robust performance by dynamically selecting appropriate pooling both at training and inference procedure, we know SoM-TP is not dependent on these descriptive statistics. Please refer to the updated Appendix A.
[–][+]

Responses to Reviewer 2KQe (2/2)

Official Comment by Paper5971 AuthorsResponses to Reviewer 2KQe (2/2)

ICLR 2023 Conference Paper5971 Authors
18 Nov 2022, 15:32ICLR 2023 Conference Paper5971 Official CommentReaders: EveryoneShow Revisions
Comment:
  • Q3. For one to select the most suitable pooling method for a certain mini-batch, naturally, all three considered pooling operations would need to be applied to the hidden features with temporal position information in each feed-forward step of the overall architecture. This increases the complexity of the training as well as the inference stage, however, the complexity of neither is discussed by the authors. I would suggest that such a discussion is included in both the authors’ responses as well as in the paper.
    • It is true that SoM-TP has more learnable parameters than the existing pooling methods because the model has to learn appropriate attention to select proper pooling. And you can check the detailed number of parameters in Table 2. However, in the inference procedure, the attention weight doesn’t have to be trained and the sub-network is not used. For the complexity discussion, we would like to follow up reviewer’s valuable comments!
  • Q4. The tradeoff parameter (or decay) λ in Eq. (6) appears to have an important role as it controls the influence of the perspective loss. That being said, I am wondering why the authors have not analyzed the effect that different values of λ ****may have on the downstream time series classification performance. And The role of “perspective loss” ****(which can be considered central to this work) has not been analyzed.
    • The λ ablation study is done and we found optimal λ for each case of model and repository.
      • FCN for UCR: MAX- 0.1 / AVG- 1
      • FCN for UEA: MAX- 0.1 / AVG- 0.1
      • We are planning to update the λ ablation study in the Appendix section with ResNet architecture.
  • Q5. Average accuracy is considered a classification performance metric, however, to my knowledge some of the considered UCR/UEA datasets are rather imbalanced, thus the authors should have considered metrics that are less sensitive to class imbalance ****(e.g., area under the precision-recall curve (AUPRC)).
    • The f1 score is conducted to deal with the imbalanced dataset. And also in training, the weighted loss is conducted to deal with the imbalance of class. Please refer to the updated in an experimental setting, section 3.2.1.
  • Q6. Clarity: The paper is decently written, but there is certainly room for improvement when writing is concerned. However, in my view, the paper is not organized and presented well, and is difficult to follow, particularly when one reads the experimental section.
    • Please refer to the “General Response to All Reviewers” in Experiment Result and Performance.
  • Q7. Reproducibility: The experiments were conducted on widely-used time series classification datasets which are publicly available. On the other hand, the code for the proposed SoM-TP method is not made available by the authors in the present anonymized version, however, with some efforts, one might be able to implement SoM-TP by following Section 3.
    • We added a detailed caption in Figure1. Also, we are planning to update the codes in supplementary.zip.
  • Q8. Minor weaknesses: There are also certain grammatical and typographical errors and remarks that require attention.
    • We revised all the typos of the wrong references.

We would again like to thank reviewer 2KQe, and we hope that our changes adequately address all concerns. Please let us know if you have any further questions, and we are very happy to follow up!

[–][+]

Follow-up on authors' response

Official Comment by Paper5971 Reviewer 2KQeFollow-up on authors' response

ICLR 2023 Conference Paper5971 Reviewer 2KQe
20 Nov 2022, 19:28ICLR 2023 Conference Paper5971 Official CommentReaders: EveryoneShow Revisions
Comment:

I would like to thank the authors for their point-by-point response to my review and for (1) providing additional clarifications regarding the methodological novelty (particularly regarding the role of the DPLN sub-network); (2) assessing the robustness of SoM-TP’s dynamic selection of temporal poolings to different dataset sizes, dimensionalities and time series lengths; and (3) analyzing the effect of the decay λ on the downstream classification performance of SoM-TP; among other revisions that were made to the paper. The following summarizes my thoughts on some of the points from the response:

Q3: As the authors already pointed out, in addition to the number of SoM-TP’s parameters included in Table 2, I believe that the paper will benefit from a discussion on the overall complexity of SoM-TP’s training and inference procedures.

Q4: In Appendix A.2, what do the y-axes of Figures 7(a) and 7(b) refer to? For each value of λ, did the authors measure the performance in terms of conventional classification accuracy, F1-score, or perhaps another performance metric? The metric should be specified in the y-axes’ labels of the two figures. Moreover, do the values in Figure 7(a) and Figure 7(b) represent the average performances of SoM-TP over all UCR and UEA datasets, respectively?

Q5: While I do agree that measuring F1-score may be considered suitable for imbalanced data, it is still depended on a certain threshold since different precision and recall values (which the F1-score is a function of), and thus different F1-scores, can be obtained at different thresholds. This was the reason behind my suggestion of the area under the precision-recall curve (AUPRC) which is both less sensitive to class imbalance and threshold-invariant.

[–][+]

Follow-up Response to Reviewer 2KQe (updated)

Official Comment by Paper5971 AuthorsFollow-up Response to Reviewer 2KQe (updated)

ICLR 2023 Conference Paper5971 Authors
21 Nov 2022, 07:33 (modified: 22 Nov 2022, 05:12)ICLR 2023 Conference Paper5971 Official CommentReaders: EveryoneShow Revisions
Comment:

We thank Reviewer 2KQe for the follow-up feedback on our work. We agree with Reviewer 2KQe’s comments and here is our response to Q3, 4, and 5 as follows,

  • Q3. The specific discussion of SoM-TP’s complexity in the context of optimization is as follows,
    • GTP and STP have O(N) for training complexity, while DTP has O(N+wdtp) additional optimization for soft-DTW layer on the training process. However, SoM-TP has differences in training complexity because it requires the process of diverse perspective learning, which is O(N+wdtp+wA+nDPLN) additional optimization for attention weight A and DPLN. Therefore, it is true that complexity increase with SoM-TP than with other temporal poolings in the training process. On the other hand, in the inference process, all temporal pooling has O(1) complexity. Especially reason why SoM-TP decrease in complexity is in the DPLN network. DPLN and perspective loss are not required in the inference process, because SoM-TP already has optimal weight. Therefore, the complexity of SoM-TP increases while training but decreases when the training is finished.
    • As for the notation, w is the weight of the parameter and n is the weight of the layers. We are planning to add this discussion in section 3.2.2.
  • Q4. The y-axis is conventional classification accuracy on average of UCR and UEA respectively. Also, we are planning to add AUPRC to the graphs.
  • Q5. We agree with the reviewer’s comment and plan to add the AUPRC score in Table 2.
[–][+]

Official Review of Paper5971 by Reviewer QzyE

Official Review of Paper5971 by Reviewer QzyE

ICLR 2023 Conference Paper5971 Reviewer QzyE
18 Oct 2022, 12:57ICLR 2023 Conference Paper5971 Official ReviewReaders: EveryoneShow Revisions
Summary Of The Paper:

The paper proposes a new method to pool convolutional feature maps for time series. The work proposes to add an addition diverse perspective learning network used only for training and also introduces a new loss. The evaluation is done on over 100 dataset from the UCR/UEA repositories. In addition, a quantitative analysis using LRP is done.

Strength And Weaknesses:

Strengths:

  • The idea of using interpretability methods for evaluating is interesting
  • The pooling operation is an essential operation – improving it could therefore have a large impact.

Weakness:

  • The captions of Table 1, 2 and Figure 4,5 are too short (only a few words). They fail to describe the content. I especially do not understand Figures 4 and 5.

  • What is the definition of "diverse perspective learning"?

  • There seems to be no improvement over dynamic temporal pooling (DTP) in Table 2. However, Table 2 is not cited in the text, and I am unsure if I understand it correctly.

  • The LRP evaluation is incomplete. First, in 3.2.2 it is stated: "A quantitative analysis using LRP is performed to examine how the perspectives of SoM-TP are different from those of other methods." However, in section 3.3 no such comparison between the methods is made.

  • Furthermore, the experimental design is not clear and misses details:

    The relevance score, which is input attribution, is the LRP result from the best-performed individual pooling.

    On which pooling method is this based? Is the pooling method selected per individual sample or not? Which LRP rule was used to compute the attribution values?

  • The evaluation is done on UCR/UEA datasets only. The proposed pooling method could also be interesting in an NLP setting, and an evaluation of larger datasets/models would be needed.

Clarity, Quality, Novelty And Reproducibility:
  • The paper lacks substantial clarity and quality (see weaknesses).
  • As many important details are not reported, I also question the reproducibility.
Summary Of The Review:

The paper is not ready for publication. The main reasons are the incomplete evaluation (LRP and missing larger datasets), and the unfinished manuscript (missing captions, ...), which do not allow accessing this work's merits.

Correctness: 2: Several of the paper’s claims are incorrect or not well-supported.
Technical Novelty And Significance: 2: The contributions are only marginally significant or novel.
Empirical Novelty And Significance: Not applicable
Flag For Ethics Review: NO.
Recommendation: 1: strong reject
Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work.
[–][+]

Responses to Reviewer QzyE (1/2)

Official Comment by Paper5971 AuthorsResponses to Reviewer QzyE (1/2)

ICLR 2023 Conference Paper5971 Authors
18 Nov 2022, 15:37ICLR 2023 Conference Paper5971 Official CommentReaders: EveryoneShow Revisions
Comment:

We thank Reviewer QzyE for the comments and feedback on our work. We address many of Reviewer QzyE’s suggestions in the general response above (updated manuscript). And here, we respond to specific comments.

  • Q1. The captions of Tables 1, and 2 and Figures 4, and 5 are too short (only a few words). They fail to describe the content. I especially do not understand Figures 4 and 5.

    • We added a detailed caption with all figures. Please refer to the updated manuscript.
  • Q2. What is the definition of "diverse perspective learning"?

    • With the updated Introduction, which is section 1, the definition of “perspective”, “fixed-perspective learning”, and “diverse perspective learning” is defined. (also updated in sections 2 and 3)

      How to aggregate convolution features in pooling is a significant matter. Each temporal pooling has a distinct mechanism for aggregation, and we term the different mechanisms of temporal pooling as a ‘perspective'. Depending on the use of segmentation in pooling, the perspective is divided into ‘global' and ‘local', and according to the segmentation method, the local is divided into ‘rigid' and ‘dynamic'. However, each temporal pooling only deals with a single perspective on hidden features as defined. … Diverse perspective learning is the opposite concept of fixed-perspective learning, which can overcome the limitation of existing temporal poolings (Section1. Introduction)

  • Q3. There seems to be no improvement over dynamic temporal pooling (DTP) in Table 2. However, Table 2 is not cited in the text, and I am unsure if I understand it correctly.

    • Please refer to the “General Response to All Reviewers” in Experiment Result and Performance.
  • Q4. The LRP evaluation is incomplete. First, in 3.2.2 it is stated: "A quantitative analysis using LRP is performed to examine how the perspectives of SoM-TP are different from those of other methods." However, in section 3.3 no such comparison between the methods is made.

    • The qualitative analysis of SoM-TP is done in the updated Figure 2 with LRP. We can see that SoM-TP can catch the hidden important feature which is valuable for classification, while other temporal poolings could not. Please refer to the updated Figure 2 and Section 3.2.
  • Q5. Furthermore, the experimental design is not clear and misses details:

    The relevance score, which is input attribution, is the LRP result from the best-performed individual pooling.

    • The pooling classification experiment in section 3.3 is designed to prove the relationship between the best pooling method (y) and input data (x) with LRP value (L). The target (y) and LRP values are got from fixed perspective learning (individual learning of three temporal poolings), and L is from best-performed temporal pooling. The target y is also defined with pair of L as {”GTP”: 0, “STP”: 1, “DTP”: 2}. Therefore, one dataset gets {(x1,L1,y),(xN,LN,y)}, which means one dataset for one y. Please refer to the updated section 3.3.
    1. On which pooling method is this based?, 2) Is the pooling method selected per individual sample or not? and 3) Which LRP rule was used to compute the attribution values?
    • With the context of response 2, detailed responses of the experimental setting are as follows,
      1. On which pooling method is this based?
        • The pooling classification experiment is done with simple global pooling (updated in Figure 5). We use global pooling to investigate the relationship between LRP value, input (x,L), and best pooling (y), without the effect of temporal pooling and its data dependency.
      2. Is the pooling method selected per individual sample or not?
        • Because the input is based on fixed-perspective learning, one dataset has one best pooling. This experiment is not using SoM-TP, but going to prove the relationship between distinct perspective and pooling.
      3. Which LRP rule was used to compute the attribution values? :
        • z+ rule for Φ for convolutional stack and ϵ rule for f for fully-connected layers (updated in Figure 5 caption)
[–][+]

Responses to Reviewer QzyE (2/2)

Official Comment by Paper5971 AuthorsResponses to Reviewer QzyE (2/2)

ICLR 2023 Conference Paper5971 Authors
18 Nov 2022, 15:46ICLR 2023 Conference Paper5971 Official CommentReaders: EveryoneShow Revisions
Comment:
  • Q6. The evaluation is done on UCR/UEA datasets only. The proposed pooling method could also be interesting in an NLP setting and an evaluation of larger datasets/models would be needed.
    • We take experiments on larger datasets and a larger model with ResNet.
      • First, we use ResNet as a feature extractor and compare the performance of SoM-TP and other temporal poolings. As shown in Table 2, the average performance of SoM-TP beat other temporal poolings. Also, in Figure 4, the detailed performance analysis on ResNet is done. Please also refer to the “General response to All Reviewers”.

      • Second, we have done an additional study on larger datasets with ECG and EEG datasets, which are publicly available. And the performance of accuracy is as follows,

        Dataset Type Data-size Class GTP-MAX STP-MAX DTP-MAX-euc SoM-TP-MAX
        ECG uni-variate 109,446 5 0.9236 0.9421 0.8593 0.9579
        EEG multi-variate (65) 122,880 2 0.9323 0.9162 0.9087 0.9485
      • We are planning to update the result of the extended experiment on a large dataset in the Appendix section.

    • We are in the process of experimenting on an NLP task with the snli1.0 dataset, but it takes a lot of time, so we would appreciate it if you could wait for the result.

We would again like to thank reviewer QzyE, and we hope that our changes adequately address all concerns. Please let us know if you have any further questions, and we are very happy to follow up!

[–][+]

Additional Responses to Reviewer QzyE

Official Comment by Paper5971 AuthorsAdditional Responses to Reviewer QzyE

ICLR 2023 Conference Paper5971 Authors
26 Nov 2022, 08:28 (modified: 26 Nov 2022, 08:29)ICLR 2023 Conference Paper5971 Official CommentReaders: EveryoneShow Revisions
Comment:

We thank again Reviewer QzyE for the comments and feedback on our work. And here, we respond to the NLP tasks with SoM-TP.

  • Text classification (NLP) with snli1.0 datasets, and the performance of accuracy is as follows,

    Dataset Type Data-size Class LSTM LSTM-Attention Word-by-Word Attention GTP-MAX STP-MAX SoM-TP-MAX
    snli1.0 text classification 559,209 3 0.3387 0.7593 0.7784 0.7899 0.7843 0.7948
  • A more detailed explanation and analysis are in “Summarization of Discussion and Extended Experiments (2/)”

  • Furthermore, we added, “Summarization of Discussion and Extended Experiments” from all reviewers with their valuable comments and discussions and we summarized what we have done extended analysis and experiments based on feedback.

We would again like to thank reviewer QzyE, and we hope that our changes adequately address all concerns. Please let us know if you have any further questions, and we are very happy to follow up!

[–][+]

Re: Rebuttal

Official Comment by Paper5971 Reviewer QzyERe: Rebuttal

ICLR 2023 Conference Paper5971 Reviewer QzyE
27 Nov 2022, 09:14ICLR 2023 Conference Paper5971 Official CommentReaders: EveryoneShow Revisions
Comment:

Dear Authors,

first, I want to acknowledge that your rebuttal answered my questions and improved the manuscript. However, I dislike that the Authors submitted an unfinished manuscript at first, and then delivered the final paper during the rebuttal period. This is also unfair to other authors who refrained from submitting to ICLR and wait until their manuscript are done.