Withdrawn Submission by Conference • Towards Diverse Perspective Learning with Switch over Multiple Temporal Pooling
Final Summarization of Discussion
Official Comment by Paper5971 Authors • Final Summarization of Discussion
Dear all reviewers,
Here is what we discussed,
- An additional evaluation metric with PRAUC
- The complexity discussion of SoM-TP with big-O.
- An additional ablation study on SoM-TP with a perspective loss with histogram.
- DPLN pulls the classification network’s output while finding the optimal classification result.
- The extended experiment on a large real-world dataset and NLP.
and additional ablation study on SoM-TP with SVM and kernel,
- With pooled vectors(before fully-connected layers), we draw the decision boundary of SVM with various kernels and found that SoM-TP’s decision boundary classifies the features more than other temporal poolings.
As the discussion period is ending nearest, we sincerely thank all reviewers for their valuable comments. It was a great honor to have discussions with all reviewers.
Thank you.
We are looking forward to comments of the post-rebuttal and discussion
Official Comment by Paper5971 Authors • We are looking forward to comments of the post-rebuttal and discussion
Dear all reviewers,
The discussion period closes soon, so we want to hear the reviewer's feedback on the updated manuscript and our discussion. As we didn't get comments from reviewers after the discussion period, we are unsure whether we adequately address concerns. We would be happy to get replies from all reviewers.
Thank you.
Gentle Reminder of Discussion
Official Comment by Paper5971 Authors • Gentle Reminder of Discussion
Dear reviewers,
The discussion period is ending this week, so we look forward to more discussion. We would be happy to have further discussion with an updated manuscript.
- Additionally, we acknowledge that our first manuscript was in short supply. So, we tried hard to reflect the feedback of all reviewers during the rebuttal period. We would like all reviewers to know that we are just doing our best all the time; when we first submitted the manuscript, during the rebuttal period, and now in the discussion period.
We really appreciate all reviewers' time, effort, and valuable comments.
Thank you.
Summarization of Discussion and Extended Experiments (1/)
Official Comment by Paper5971 Authors • Summarization of Discussion and Extended Experiments (1/)
We thank again the reviewers for their thoughtful effort in their review and discussion. And here is the summary of the “Discussion” and “Extended Experiment”.
First, as for the “Discussion”, we discussed the complexity of SoM-TP compared to other existing temporal poolings (with reviewer 2KQe).
For complexity in the concentrate on each pooling layer and whole model optimization are as follows,
- GTP:
for pooling complexity, and for training optimization complexity. - STP:
for pooling complexity, and for training optimization complexity. - DTP:
for calculating soft-DTW in forward and backward by GPU, while in increment in complexity by CPU for pooling complexity, and for training optimization complexity with additional optimization for soft-DTW layer. - SoM-TP:
for pooling complexity to compute attention score to dynamic pooling selection, ( for query and key, respectively). And for training optimization complexity with additional optimization for attention weight and DPLN. - In the inference process, optimization complexity becomes all
and only pooling complexity remains. - Here, we considered pooling complexity except for calculating the maximum or average value which has in common.
There are trade-offs between complexity and performance because increasing complexity and capacity generally make better performance. And SoM-TP which has the highest complexity and capacity achieved the best performance on TSC. To this end, the increment in complexity is needed to reflect various poolings in the context of perspective. Therefore, SoM-TP has a little degradation in complexity but has also achieved robust performance.
• As for the notation,
is the weight of the parameter and is the weight of the layers. We are planning to add this discussion in section 3.2.2.- GTP:
Second, as for the “Extended Experiment”, we have done an additional ablation study on SpM-TP with perspective loss. We draw a histogram with several epochs of training which indicate the distribution in comparison with classification network output (
) and DPLN output ( ). This is for all reviewers’ comments on the solidity and novelty of SoM-TP.As for perspective loss,
with
is cross-entropy loss with . The reason why is included in is that the sub-network should be trained to make the right classification.We found that
distribution pulls while finding optimal classification results as itself. Therefore, are tied to ’s distribution and indirectly reflected by classification results from all pooling output vectors. And here, the updated distribution with every epoch shows that perspective loss works rightly for diverse perspective learning.We are planning to add this experiment to Appendix A.
We would again like to thank all reviewers for their time and feedback, and we hope that our discussion adequately addresses all concerns. Please let us know if you have any further questions, and we are very happy to follow up!
Summarization of Discussion and Extended Experiments (2/)
Official Comment by Paper5971 Authors • Summarization of Discussion and Extended Experiments (2/)
- Third, as for the “Extended Experiment”, we have done additional experiments on real-world larger datasets in time series and NLP (for the comment from reviewer QzyE).
Larger datasets with ECG and EEG datasets, and the performance of accuracy is as follows,
Dataset Type Data-size Class GTP-MAX STP-MAX DTP-MAX-euc SoM-TP-MAX ECG uni-variate 109,446 5 0.9236 0.9421 0.8593 0.9579 EEG multi-variate (65) 122,880 2 0.9323 0.9162 0.9087 0.9485 that SoM-TP performs best.
Text classification (NLP) with snli1.0 datasets, and the performance of accuracy is as follows,
Dataset Type Data-size Class LSTM LSTM-Attention Word-by-Word Attention GTP-MAX STP-MAX SoM-TP-MAX snli1.0 text classification 559,209 3 0.3387 0.7593 0.7784 0.7899 0.7843 0.7948 - Especially, in the NLP task, we found valuable characteristics compared to DTP.
- DTP has the problematic condition for layer-by-layer pooling between convolutional or RNN layers because it has to optimize the “soft-DTW parameter” based on the previously hidden feature.
- This leads to the optimization complexity in proportion to the layer number of DTP, which is the same as the soft-DTW layer.
- However, SoM-TP is a framework of diverse perspective learning, it is possible to be composed between other layers. In this task, we constructed SoM-TP only with GTP and STP between three RNN layers. And SoM-TP outperforms other temporal poolings and also with attention-based models.
- We referenced the model [1] (with embedding dimension 128, hidden dimension 256, batch 32, and learning rate 0.005). As shown in the table above, simple NLP model architecture for text classification such as LSTM or attention is specially designed for comparison with LSTM-pooling architecture.
- SoM-TP is also useful in hierarchical model architecture and this is another novelty comparison to DTP.
- We are planning to add this experiment to Appendix A.
[1] Alexis Conneau, Douwe Kiela, Holger Schwenk, Loic Barrault, and Antoine Bordes. Supervised learning of universal sentence representations from natural language inference data, 2017. URL https://arxiv.org/abs/1705.02364.
We would again like to thank all reviewers for their time and feedback, and we hope that our discussion adequately addresses all concerns. Please let us know if you have any further questions, and we are very happy to follow up!
Summarization of Discussion and Extended Experiments (3/)
Official Comment by Paper5971 Authors • Summarization of Discussion and Extended Experiments (3/)
During discussion week, we responded to the model complexity of SoM-TP. And here is an additional and detailed response to SoM-TP's complexity.
- The increase in capacity and complexity of the model sometimes causes overfitting and degradation of performance.
- The complexity increase in SoM-TP is caused by the attention mechanism to automatically and dynamically select from distinct poolings. In detail, the complexity of the pooling method itself does not increase.
- Pooling complexity depends on which poolings are selected by attention (e.g.
for GTP and for STP and DTP). And complexity for calculating attention score is added for SoM-TP.
- Pooling complexity depends on which poolings are selected by attention (e.g.
- In this context, SoM-TP does not change or increase pooling complexity itself but increases in complexity according to the attention framework. This is why SoM-TP robustly perform on various TSC dataset than other temporal poolings, while complexity increase. And this is another point with a simple ensemble network. We think that this is another novelty for SoM-TP.
We would again like to thank all reviewers for their time and feedback, and the discussion period is up to the end, please let us know if you have any further questions; we are happy to follow up!
Additional General Response to All Reviewers (1/2)
Official Comment by Paper5971 Authors • Additional General Response to All Reviewers (1/2)
To all reviewers,
We thank again the reviewers for their review. In response to feedback, we provide an additional general response to “novelty” and an updated manuscript.
- First, we updated the manuscript to effectively deliver SoM-TP’s purpose and goal to diverse perspective learning with detailed explanations and captions with the newly designed figures. Also, the Appendix section is added for additional experiments and results to support SoM-TP for effective comprehension. And here is the summarization of the updated manuscript as follows,
- Section 1 “Introduction” is updated with the definition of “perspective” of pooling, followed by the meaning of “fixed-perspective learning” and “diverse-perspective learning”.
- Section 2 “Background” is updated to explain the distinct perspective of existing temporal poolings and also with detailed examples of LRP.
- Section 3 “SoM-TP: Towards Diverse Perspective Learning” is updated with a detailed notation of the formula. We specifically tried to explain the role of DPLN and perspective loss.
- Also in section 3 with performance results, the analysis is updated with a quantitative way of tabling average performance and a qualitative way of LRP in Figure 2. Furthermore, we tried to objectively analyze SoM-TP’s robustness with other evaluation metrics, such as histogram and rank bar chart.
- Table 2 is updated with the average performance of the UCR/UEA repository with various cases of temporal poolings, compared to SoM-TP. Also, ResNet architecture is added.
- Figure 1 is updated with a detailed notation of vectors and matrixes, and the two fully-connected layers are detailly defined. Also the detailed caption with the flow of the SoM-TP process.
- Figure 2 is updated by adding the accuracy performance and color to the figure. We circled to where each pooling is focusing on, and samely provide performance below. With detailed explanations in the caption with example-based analysis, we can get through why SoM-TP’s diverse perspective is necessary.
- Figure 3 is updated with a detailed explanation in the caption. We tried to explain how SoM-TP finds optimal attention scores to reach diverse perspective learning.
- Figure 4 is added, which is to show an objective analysis of SoM-TP. The histogram and bar chart is selected to see SoM-TP’s outperformance.
- Second for the updated Appendix as follows,
- In A.1. a detailed experimental setting, specifically for the convolutional stack is explained in the figure, containing embedding dimensions and overall architecture.
- In A.2. an ablation study in
is added. We designed the experiment to show how perspective loss with decay affects the SoM-TP in performance. And we found optimal for the datasets. - In A.3. an extended analysis of SoM-TP with an individual dataset with varies time length, data size, and dimension, to see if SoM-TP robustly works in any characteristic of the dataset. And figure shows that SoM-TP dynamically selects each data with appropriate pooling in any dataset characteristics.
- In B. the DTP algorithm is introduced.
- In C. the extended related work is introduced.
Additional General Response to All Reviewers (2/2)
Official Comment by Paper5971 Authors • Additional General Response to All Reviewers (2/2)
- Finally, for the novelty, we would like to say three things carefully as follows,
- There was a rare in the study of time series data and pooling with XAI methods, especially with LRP input attribution. And with LRP, we found that temporal pooling, which is known as SOTA for TSC, has different and fixed perspectives for each.
- In the same context of LRP and input attribution, there was no attempt to detailly analyze the perspective of existing pooling. Therefore, the deep analysis of existing temporal pooling leads us to figure out “diverse perspective learning” to leverage pooling for TSC.
- And to reach “diverse perspective learning”, it is true that SoM-TP uses existing temporal poolings, but the important that we want to say is the learning framework. With attention score and perspective loss, any pooling with a different perspective can be implemented to SoM-TP. This is the most advantage of SoM-TP.
- We updated the code in supplementary.zip for reproducibility.
We would again like to thank all reviewers for their time and feedback, and we hope that our changes adequately address all concerns.
General Response to All Reviewers
Official Comment by Paper5971 Authors • General Response to All Reviewers
To all reviewers,
We thank the reviewers for their thoughtful and constructive review. In response to feedback, we provide a general response to points raised by multiple reviewers and an updated manuscript.
Experiment Result and Performance: SoM-TP has a robust and highest performance overall.
We updated more clear experiment results through both quantitative analysis and qualitative analysis. For the quantitative analysis, three evaluation methods are used to compare SoM-TP to other pooling methods: 1) Average performance comparison with Table2, 2) histogram comparison between SoM-TP and DTP which is SOTA of temporal pooling, and 3) rank bar chart.
- First, the Average performance of SoM-TP outperforms all the temporal poolings in Table 2 as mentioned in the second bullet point.
- Second, we calculated histograms under-area to see how many datasets cases in which SoM-TP and DTP perform better than each other. and SoM-TP also outperformed DTP with a big gap in Figure 4.
- Finally, the bar chart (Figure 4) shows SoM-TP has more robust results than other temporal poolings from the fact that SoM-TP has the most number of rank 1 and the least number of rank 4.
- A detailed explanation of these three evaluations is in section 3.2.3.
The optimal
is applied through the hyperparameter search, and SoM-TP outperforms all the other temporal pooling methods.- In Table 2, you can check that SoM-TP outperforms all the other temporal pooling methods (including DTP) based on the average performance of all datasets.
- In detail, for the repository of 112 univariate datasets, SoM-TP MAX shows the best performance in FCN (acc: 0.7503 / f1macro: 0.7212) and ResNet (acc: 0.7690 / f1macro: 0.7398).
- For the repository of 21 multivariate datasets, SoM-TP also shows better performance than the other pooling method in FCN (acc: 0.6969 / f1macro: 0.6648) and ResNet (acc: 0.6766 / f1macro: 0.6542).
- If you would like to check more detailed results, please refer to Table 2, Figure 4, and Section 3.2.
For the qualitative analysis (Figure 2), we can check the LRP results of GTP, STP, DTP, and SoM-TP with a different perspective of each pooling.
- In the example datasets (CricketZ, Fungi, and WordSynonyms) with input attribution results, we can identify how the diverse perspectives of SoM-TP focus differently with other temporal poolings.
Furthermore, we are planning to update the Appendix section with additional experiment explanations and results. (e.g. detailed explanation of experimental setting,
ablation study, extended study on large dataset, and NLP).
We would again like to thank all reviewers for their time and feedback, and we hope that our changes adequately address all concerns.
Official Review of Paper5971 by Reviewer FNQF
Official Review of Paper5971 by Reviewer FNQF
The paper proposes a attention based method that can dynamically select data-specific temporal pooling method from (1) global temporal pooling (GTP), (2) static temporal pooling (STP), and (3) dynamic temporal pooling (DTP). Experiments can show the effectiveness of the proposed method and demonstrate the approach indeed selects different pooling methods for different batches.
Strengths:
- The work is well-motivated given the sufficient analysis on the limitations of each existing temporal pooling method (Section 2.2).
- Experiments can show how different pooling methods vary while changing different batches of data.
Weaknesses:
- The novelty and solidity of the work are insufficient. The work is incremental on top of Lee, Dongha, Seonghyeon Lee, and Hwanjo Yu. "Learnable dynamic temporal pooling for time series classification." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 35. No. 9. 2021. given that it aims to select different pooling methods from the above mentioned work based on attention mechanism.
- The proposed method SoM-TP does not perform better than DTP according to Table 2.
- The paper need to be proofread more carefully. Many algorithms and figures are not referenced correctly (many ? reference throughout the paper).
- Experimental results are not clearly explained.
Both clarity and novelty need to be improved:
Clarity: The motivation is clearly illustrated given the extensive analysis and comparisons in Section 2.2. on the constraints of existing temporal pooling methods. However, the clarity of experiments is not sufficient: (1) what's the embedding dimension? This also relates to the limited reproducibility; (2) which specific dataset (UCR or UEA) is used for a specific result (e.g., dataset for Table 2 is not mentioned). This also relates to the limited reproducibility.
Novelty: As explained above, the novelty of this work is incremental.
Despite some strengths of this paper (e.g., good motivations, sufficient motivating analysis), there are a few weaknesses that need to be addressed for acceptance: (1) novelty, clarity; (2) experimental effectiveness of the proposed method; (3) writing readiness.
Responses to Reviewer FNQF
Official Comment by Paper5971 Authors • Responses to Reviewer FNQF
We thank Reviewer FNQF for the comments and helpful feedback on our work. We address many of Reviewer FNQF’s suggestions in the general response above (updated manuscript). And here, we respond to specific comments.
- Q1. The novelty and solidity of the work are insufficient. The work is incremental on top of Lee, Dongha, Seonghyeon Lee, and Hwanjo Yu. "Learnable dynamic temporal pooling for time series classification." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 35. No. 9. 2021. given that it aims to select different pooling methods from the above-mentioned work based on the attention mechanism.
- Q2. The proposed method SoM-TP does not perform better than DTP according to Table 2.
- Q3. Experimental results are not clearly explained. And What experimental effectiveness of the proposed method?
- For Q1, 2, and 3, please refer to the “General Response to All Reviewers” in Experiment Result and Performance. Also, Please refer to the updated manuscript.
- Q4. Clarity: The motivation is clearly illustrated given the extensive analysis and comparisons in Section 2.2. on the constraints of existing temporal pooling methods. However, the clarity of experiments is not sufficient: (1) what's the embedding dimension? This also relates to the limited reproducibility; (2) which specific dataset (UCR or UEA) is used for a specific result (e.g., the dataset for Table 2 is not mentioned). This also relates to limited reproducibility.
- Q4-1. What’s the embedding dimension?
- We use a convolutional stack as a feature extractor in the experiments.
- FCN with three convolutional layers: 128/256/256 dimensions in order.
- ResNet with 9 convolutional layers and skip layers: 64/128/256 dimensions in order with three residual blocks.
- Also, we clarified other components of the model architecture (e.g. number of convolutional layers, segmentation number, number of fully-connected layers, batch size, and window size) in Table 1.
- In Table 2, the whole number of model parameters is specified.
- We are planning to update the detailed model architecture for FCN and ResNet in the Appendix section.
- Q4-2. Which specific dataset (UCR or UEA) is used for a specific result (e.g., the dataset for Table 2 is not mentioned)?
- In Table 2, the average performances for all UCR and UEA repositories are updated with FCN and ResNet architectures.
- For the qualitative analysis using the LRP method (Figure 2), we specified the datasets used in each experiment result.
- Q4-1. What’s the embedding dimension?
- Q5. The paper needs to be proofread more carefully. Many algorithms and figures are not referenced correctly (many ? references throughout the paper).
- We revised all the typos of the wrong references.
We would again like to thank reviewer FNQF, and we hope that our changes adequately address all concerns. Please let us know if you have any further questions, and we are very happy to follow up!
Additional response to Reviewer FNQF
Official Comment by Paper5971 Authors • Additional response to Reviewer FNQF
To Reviewer FNQF,
We added, “Summarization of Discussion and Extended Experiments” from all reviewers with their valuable comments and discussions and we summarized what we have done extended analysis and experiments based on feedback.
We really appreciate your comment, question, and feedback on our work. And we hope that our response adequately addresses all concerns. Please let us know if you have any further questions, and we are very happy to follow up!
Official Review of Paper5971 by Reviewer 2KQe
Official Review of Paper5971 by Reviewer 2KQe
In this paper, a pooling architecture for diverse perspective learning is proposed. The architecture is referred to as switch over multiple pooling (SoM-TP). To motivate their method, the authors argue that one pooling cannot be dominant in various time series data characteristics and thus there is a need for a more robust pooling that encompasses multiple diverse perspectives. Experiments were conducted on UCR/UEA public datasets, the results of which demonstrate the effectiveness and robustness of SoM-TP in comparison with three other pooling methods.
Strengths:
Existing temporal poolings cannot capture dependencies that may exist within time series data through a fixed perspective learning since a single pooling method considers only one perspective, i.e. has only a single viewpoint. In contrast, SoM-TP is designed to address this problem by leveraging a diverse perspective learning (DPLN) sub-network in its training process that captures various (including both global and local) viewpoints.
SoM-TP is capable of identifying the characteristics of the data in a specific mini-batch and leverages attention weights to dynamically select suitable temporal poolings based on the identified data characteristics. In a downstream classification scenario, however, the output class is predicted considering all perspectives of pooling in SoM-TP.
The authors have compared SoM-TP with three pooling methods (GTP, STP and DTP) on 112 univariate and 21 multivariate time series datasets from the UCR/UEA repository stemming from various domains. The results (1) demonstrate that SoM-TP tends to be more robust to data that requires capturing different perspectives; and (2) suggest that SoM-TP is more accurate than fixed perspective learning with GTP and STP, while having similar accuracy to that of DTP.
The authors have conducted an ablation study that provides an insight into the inner-workings of the diverse perspective learning process.
Weaknesses:
The proposed SoM-TP method appears to simply combine several components including different pooling methods (GTP, STP and DTP) and an attention mechanism, which are all already well-established in the literature. Even the DPLN sub-network is a simple classification network that considers all pooling perspectives to make a final decision, which limits the methodological novelty of this work.
SoM-TP is designed to dynamically select suitable temporal poolings based on certain characteristics that were identified in a certain portion (mini-batch) of a dataset. Nevertheless, neither have the characteristics (identified in each of the considered datasets) been discussed in the paper; nor have the authors discussed how the descriptive statistics (such as size, dimensionality, time series length, etc.) of different datasets impact the dynamic selection of the temporal poolings. I would encourage the authors to discuss and provide more clarity on the aforementioned two points in their repose.
For one to select the most suitable pooling method for a certain mini-batch, naturally, all three considered pooling operations would need to be applied to the hidden features with temporal position information in each feed-forward step of the overall architecture. This increases the complexity of the training as well as the inference stage, however, the complexity of neighter is discussed by the authors. I would suggest that such a discussion is included in both the authors’ response as well as in the paper.
The tradeoff parameter (or decay)
in Eq. (6) appears to have an important role as it controls the influence of the perspective loss. That being said, I am wondering why the authors have not analyzed the effect that different values of may have on the downstream time series classification performance.Average accuracy is considered as a classification performance metric, however, to my knowledge some of the considered UCR/UEA datasets are rather imbalanced, thus the authors should have considered metrics that are less sensitive to class imbalance (e.g., area under the precision-recall curve (AUPRC)).
Minor weaknesses: There are also certain grammatical and typographical errors and remarks that require attention. Some of them are summarized as follows:
- In the next-to-last paragraph of the Introduction section on page 2, “all learnable weight” should be replaced with “all learnable weights”.
- In the “Static Temporal Pooling” paragraph on page 3, replace “segmentation” with “segments”.
- In the “Dynamic Temporal Pooling” paragraph on page 3, there is a missing reference to an Algorithm and another one to an Appendix.
- At the beginning of page 6, there is a reference missing right after “class weight matrix”.
- In the caption of Table 1, “detail” should be replaced with “detailed”.
- In the “Robustness” paragraph on page 7, “Figure 4 (a)” should be corrected to “Figure 5 (a)”. Later in the same sentence, “GTP” should be included between the words “than” and “as”.
- The caption of Figure 5 should be more descriptive instead of simply being “Performance graph”.
Clarity: The paper is decently written, but there is certainly room for improvement when writing is concerned. However, in my view, the paper is not organized and presented well, while being difficult-to-follow, particularly when one reads the experimental section.
Quality: The design and justifications for the proposed SoM-TP method seem to be technically sound. The observations made regarding the robustness and accuracy of SoM-TP seem to hold in the considered settings, however, these observations need to be further supported by conducting additional ablation studies (mentioned in the “Weaknesses” part of this review). Overall, the paper has merit but needs more work as it does not appear to be fully developed in terms of quality.
Novelty: From a methodological perspective, the contribution of this work can be considered rather incremental. In essence, the authors leveraged several existing pooling methods and integrated them into a single architecture that learns attention weights for each of the poolings in other to select the most suitable pooling given a certain batch of data samples (please refer to the first point of the “Weaknesses” part of this review, where this is explained in more detail).
Reproducibility: The experiments were conducted on widely-used time series classification datasets which are publicly available. On the other hand, the code for the proposed SoM-TP method is not made available by the authors in the present anonymized version, however, with some effort one might be able to implement SoM-TP by following Section 3.
This work has merit but does not appear to be well developed. Although the authors provided interesting insights regarding the need for diverse perspective learning for pooling methods, the work, in its current state, lacks methodological novelty; certain key aspects (such as the impact of data size, dimensionality, time series length, etc., on the dynamic pooling selection) are not addressed in the paper; the role of the perspective loss (which can be considered central to this work) has not been analyzed; among other weak points outlined in the “Weaknesses” part of this review. Overall, these weak points of the paper seem to outweigh its strengths. Therefore, I am not convinced that this work is a good fit for ICLR. Nevertheless, I am looking forward to the authors’ response and I would be willing to adjust my score in case I have misunderstood or misinterpreted certain aspects of the work.
Not applicable.
Responses to Reviewer 2KQe (1/2)
Official Comment by Paper5971 Authors • Responses to Reviewer 2KQe (1/2)
We thank Reviewer 2KQe for the comments and thoughtful feedback on our work. We address many of Reviewer 2KQe’s suggestions in the general response above (updated manuscript). And here, we respond to specific comments.
- Q1. 1)The proposed SoM-TP method appears to simply combine several components including different pooling methods (GTP, STP, and DTP) and an attention mechanism, which is all already well-established in the literature. 2)Even the DPLN sub-network is a simple classification network that considers all pooling perspectives to make a final decision, which limits the methodological novelty of this work.
- 1, 2) We updated section 3.1. with a detailed explanation. In SoM-TP, DPLN is a regularizer network that considers all pooling perspectives. Thus, DPLN output,
, is not used for the final classification decisions. It is used to calculate the KL divergence term as a term in the perspective loss, then this indirectly makes the model learn diverse perspectives. Here is the following process of diverse perspective learning in the usage of DPLN.The final classification result
is from the classification network (main) which gets the input of selected pooling features .DPLN uses the multiplication of attention score
and pooling feature vectors as input, and makes classification decision . However, this result is not used for actual classification, but for calculating the KL divergence term in the perspective loss.And the perspective loss is as follows,
with
is cross-entropy loss with . The reason why is included in is that the sub-network should be trained to make the right classification.The final cost function is as follows,
,with
is cross-entropy loss with .Therefore, DPLN itself is an ensemble network but works as a regularizer in SoM-TP. (also please refer to section 3.2.2 “What is the role of DPLN?”)
- 1, 2) We updated section 3.1. with a detailed explanation. In SoM-TP, DPLN is a regularizer network that considers all pooling perspectives. Thus, DPLN output,
- Q2. SoM-TP is designed to dynamically select suitable temporal poolings based on certain characteristics that were identified in a certain portion (mini-batch) of a dataset. Nevertheless, 1) neither have the characteristics (identified in each of the considered datasets) been discussed in the paper; 2) nor have the authors discussed how the descriptive statistics (such as size, dimensionality, time series length, etc.) of different datasets impact the dynamic selection of the temporal poolings. I would encourage the authors to discuss and provide more clarity on the aforementioned two points in their repose.
- 1, 2) Datasets in UCR/UEA repository vary in data sizes, time lengths, and dimensions. From the fact that the SoM-TP shows robust performance by dynamically selecting appropriate pooling both at training and inference procedure, we know SoM-TP is not dependent on these descriptive statistics. Please refer to the updated Appendix A.
Responses to Reviewer 2KQe (2/2)
Official Comment by Paper5971 Authors • Responses to Reviewer 2KQe (2/2)
- Q3. For one to select the most suitable pooling method for a certain mini-batch, naturally, all three considered pooling operations would need to be applied to the hidden features with temporal position information in each feed-forward step of the overall architecture. This increases the complexity of the training as well as the inference stage, however, the complexity of neither is discussed by the authors. I would suggest that such a discussion is included in both the authors’ responses as well as in the paper.
- It is true that SoM-TP has more learnable parameters than the existing pooling methods because the model has to learn appropriate attention to select proper pooling. And you can check the detailed number of parameters in Table 2. However, in the inference procedure, the attention weight doesn’t have to be trained and the sub-network is not used. For the complexity discussion, we would like to follow up reviewer’s valuable comments!
- Q4. The tradeoff parameter (or decay) λ in Eq. (6) appears to have an important role as it controls the influence of the perspective loss. That being said, I am wondering why the authors have not analyzed the effect that different values of λ ****may have on the downstream time series classification performance. And The role of “perspective loss” ****(which can be considered central to this work) has not been analyzed.
- The
ablation study is done and we found optimal for each case of model and repository.- FCN for UCR: MAX- 0.1 / AVG- 1
- FCN for UEA: MAX- 0.1 / AVG- 0.1
- We are planning to update the
ablation study in the Appendix section with ResNet architecture.
- The
- Q5. Average accuracy is considered a classification performance metric, however, to my knowledge some of the considered UCR/UEA datasets are rather imbalanced, thus the authors should have considered metrics that are less sensitive to class imbalance ****(e.g., area under the precision-recall curve (AUPRC)).
- The f1 score is conducted to deal with the imbalanced dataset. And also in training, the weighted loss is conducted to deal with the imbalance of class. Please refer to the updated in an experimental setting, section 3.2.1.
- Q6. Clarity: The paper is decently written, but there is certainly room for improvement when writing is concerned. However, in my view, the paper is not organized and presented well, and is difficult to follow, particularly when one reads the experimental section.
- Please refer to the “General Response to All Reviewers” in Experiment Result and Performance.
- Q7. Reproducibility: The experiments were conducted on widely-used time series classification datasets which are publicly available. On the other hand, the code for the proposed SoM-TP method is not made available by the authors in the present anonymized version, however, with some efforts, one might be able to implement SoM-TP by following Section 3.
- We added a detailed caption in Figure1. Also, we are planning to update the codes in supplementary.zip.
- Q8. Minor weaknesses: There are also certain grammatical and typographical errors and remarks that require attention.
- We revised all the typos of the wrong references.
We would again like to thank reviewer 2KQe, and we hope that our changes adequately address all concerns. Please let us know if you have any further questions, and we are very happy to follow up!
Follow-up on authors' response
Official Comment by Paper5971 Reviewer 2KQe • Follow-up on authors' response
I would like to thank the authors for their point-by-point response to my review and for (1) providing additional clarifications regarding the methodological novelty (particularly regarding the role of the DPLN sub-network); (2) assessing the robustness of SoM-TP’s dynamic selection of temporal poolings to different dataset sizes, dimensionalities and time series lengths; and (3) analyzing the effect of the decay
Q3: As the authors already pointed out, in addition to the number of SoM-TP’s parameters included in Table 2, I believe that the paper will benefit from a discussion on the overall complexity of SoM-TP’s training and inference procedures.
Q4: In Appendix A.2, what do the y-axes of Figures 7(a) and 7(b) refer to? For each value of
Q5: While I do agree that measuring F1-score may be considered suitable for imbalanced data, it is still depended on a certain threshold since different precision and recall values (which the F1-score is a function of), and thus different F1-scores, can be obtained at different thresholds. This was the reason behind my suggestion of the area under the precision-recall curve (AUPRC) which is both less sensitive to class imbalance and threshold-invariant.
Follow-up Response to Reviewer 2KQe (updated)
Official Comment by Paper5971 Authors • Follow-up Response to Reviewer 2KQe (updated)
We thank Reviewer 2KQe for the follow-up feedback on our work. We agree with Reviewer 2KQe’s comments and here is our response to Q3, 4, and 5 as follows,
- Q3. The specific discussion of SoM-TP’s complexity in the context of optimization is as follows,
- GTP and STP have
for training complexity, while DTP has additional optimization for soft-DTW layer on the training process. However, SoM-TP has differences in training complexity because it requires the process of diverse perspective learning, which is additional optimization for attention weight and DPLN. Therefore, it is true that complexity increase with SoM-TP than with other temporal poolings in the training process. On the other hand, in the inference process, all temporal pooling has complexity. Especially reason why SoM-TP decrease in complexity is in the DPLN network. DPLN and perspective loss are not required in the inference process, because SoM-TP already has optimal weight. Therefore, the complexity of SoM-TP increases while training but decreases when the training is finished. - As for the notation,
is the weight of the parameter and is the weight of the layers. We are planning to add this discussion in section 3.2.2.
- GTP and STP have
- Q4. The y-axis is conventional classification accuracy on average of UCR and UEA respectively. Also, we are planning to add AUPRC to the graphs.
- Q5. We agree with the reviewer’s comment and plan to add the AUPRC score in Table 2.
Official Review of Paper5971 by Reviewer QzyE
Official Review of Paper5971 by Reviewer QzyE
The paper proposes a new method to pool convolutional feature maps for time series. The work proposes to add an addition diverse perspective learning network used only for training and also introduces a new loss. The evaluation is done on over 100 dataset from the UCR/UEA repositories. In addition, a quantitative analysis using LRP is done.
Strengths:
- The idea of using interpretability methods for evaluating is interesting
- The pooling operation is an essential operation – improving it could therefore have a large impact.
Weakness:
The captions of Table 1, 2 and Figure 4,5 are too short (only a few words). They fail to describe the content. I especially do not understand Figures 4 and 5.
What is the definition of "diverse perspective learning"?
There seems to be no improvement over dynamic temporal pooling (DTP) in Table 2. However, Table 2 is not cited in the text, and I am unsure if I understand it correctly.
The LRP evaluation is incomplete. First, in 3.2.2 it is stated: "A quantitative analysis using LRP is performed to examine how the perspectives of SoM-TP are different from those of other methods." However, in section 3.3 no such comparison between the methods is made.
Furthermore, the experimental design is not clear and misses details:
The relevance score, which is input attribution, is the LRP result from the best-performed individual pooling.
On which pooling method is this based? Is the pooling method selected per individual sample or not? Which LRP rule was used to compute the attribution values?
The evaluation is done on UCR/UEA datasets only. The proposed pooling method could also be interesting in an NLP setting, and an evaluation of larger datasets/models would be needed.
- The paper lacks substantial clarity and quality (see weaknesses).
- As many important details are not reported, I also question the reproducibility.
The paper is not ready for publication. The main reasons are the incomplete evaluation (LRP and missing larger datasets), and the unfinished manuscript (missing captions, ...), which do not allow accessing this work's merits.
Responses to Reviewer QzyE (1/2)
Official Comment by Paper5971 Authors • Responses to Reviewer QzyE (1/2)
We thank Reviewer QzyE for the comments and feedback on our work. We address many of Reviewer QzyE’s suggestions in the general response above (updated manuscript). And here, we respond to specific comments.
Q1. The captions of Tables 1, and 2 and Figures 4, and 5 are too short (only a few words). They fail to describe the content. I especially do not understand Figures 4 and 5.
- We added a detailed caption with all figures. Please refer to the updated manuscript.
Q2. What is the definition of "diverse perspective learning"?
With the updated Introduction, which is section 1, the definition of “perspective”, “fixed-perspective learning”, and “diverse perspective learning” is defined. (also updated in sections 2 and 3)
How to aggregate convolution features in pooling is a significant matter. Each temporal pooling has a distinct mechanism for aggregation, and we term the different mechanisms of temporal pooling as a ‘perspective'. Depending on the use of segmentation in pooling, the perspective is divided into ‘global' and ‘local', and according to the segmentation method, the local is divided into ‘rigid' and ‘dynamic'. However, each temporal pooling only deals with a single perspective on hidden features as defined. … Diverse perspective learning is the opposite concept of fixed-perspective learning, which can overcome the limitation of existing temporal poolings (Section1. Introduction)
Q3. There seems to be no improvement over dynamic temporal pooling (DTP) in Table 2. However, Table 2 is not cited in the text, and I am unsure if I understand it correctly.
- Please refer to the “General Response to All Reviewers” in Experiment Result and Performance.
Q4. The LRP evaluation is incomplete. First, in 3.2.2 it is stated: "A quantitative analysis using LRP is performed to examine how the perspectives of SoM-TP are different from those of other methods." However, in section 3.3 no such comparison between the methods is made.
- The qualitative analysis of SoM-TP is done in the updated Figure 2 with LRP. We can see that SoM-TP can catch the hidden important feature which is valuable for classification, while other temporal poolings could not. Please refer to the updated Figure 2 and Section 3.2.
Q5. Furthermore, the experimental design is not clear and misses details:
The relevance score, which is input attribution, is the LRP result from the best-performed individual pooling.
- The pooling classification experiment in section 3.3 is designed to prove the relationship between the best pooling method (
) and input data ( ) with LRP value ( ). The target ( ) and LRP values are got from fixed perspective learning (individual learning of three temporal poolings), and is from best-performed temporal pooling. The target is also defined with pair of as {”GTP”: 0, “STP”: 1, “DTP”: 2}. Therefore, one dataset gets { }, which means one dataset for one . Please refer to the updated section 3.3.
- On which pooling method is this based?, 2) Is the pooling method selected per individual sample or not? and 3) Which LRP rule was used to compute the attribution values?
- With the context of response 2, detailed responses of the experimental setting are as follows,
- On which pooling method is this based?
- The pooling classification experiment is done with simple global pooling (updated in Figure 5). We use global pooling to investigate the relationship between LRP value, input
, and best pooling ( ), without the effect of temporal pooling and its data dependency.
- The pooling classification experiment is done with simple global pooling (updated in Figure 5). We use global pooling to investigate the relationship between LRP value, input
- Is the pooling method selected per individual sample or not?
- Because the input is based on fixed-perspective learning, one dataset has one best pooling. This experiment is not using SoM-TP, but going to prove the relationship between distinct perspective and pooling.
- Which LRP rule was used to compute the attribution values? :
rule for for convolutional stack and rule for for fully-connected layers (updated in Figure 5 caption)
- On which pooling method is this based?
- The pooling classification experiment in section 3.3 is designed to prove the relationship between the best pooling method (
Responses to Reviewer QzyE (2/2)
Official Comment by Paper5971 Authors • Responses to Reviewer QzyE (2/2)
- Q6. The evaluation is done on UCR/UEA datasets only. The proposed pooling method could also be interesting in an NLP setting and an evaluation of larger datasets/models would be needed.
- We take experiments on larger datasets and a larger model with ResNet.
First, we use ResNet as a feature extractor and compare the performance of SoM-TP and other temporal poolings. As shown in Table 2, the average performance of SoM-TP beat other temporal poolings. Also, in Figure 4, the detailed performance analysis on ResNet is done. Please also refer to the “General response to All Reviewers”.
Second, we have done an additional study on larger datasets with ECG and EEG datasets, which are publicly available. And the performance of accuracy is as follows,
Dataset Type Data-size Class GTP-MAX STP-MAX DTP-MAX-euc SoM-TP-MAX ECG uni-variate 109,446 5 0.9236 0.9421 0.8593 0.9579 EEG multi-variate (65) 122,880 2 0.9323 0.9162 0.9087 0.9485 We are planning to update the result of the extended experiment on a large dataset in the Appendix section.
- We are in the process of experimenting on an NLP task with the snli1.0 dataset, but it takes a lot of time, so we would appreciate it if you could wait for the result.
- We take experiments on larger datasets and a larger model with ResNet.
We would again like to thank reviewer QzyE, and we hope that our changes adequately address all concerns. Please let us know if you have any further questions, and we are very happy to follow up!
Additional Responses to Reviewer QzyE
Official Comment by Paper5971 Authors • Additional Responses to Reviewer QzyE
We thank again Reviewer QzyE for the comments and feedback on our work. And here, we respond to the NLP tasks with SoM-TP.
Text classification (NLP) with snli1.0 datasets, and the performance of accuracy is as follows,
Dataset Type Data-size Class LSTM LSTM-Attention Word-by-Word Attention GTP-MAX STP-MAX SoM-TP-MAX snli1.0 text classification 559,209 3 0.3387 0.7593 0.7784 0.7899 0.7843 0.7948 A more detailed explanation and analysis are in “Summarization of Discussion and Extended Experiments (2/)”
Furthermore, we added, “Summarization of Discussion and Extended Experiments” from all reviewers with their valuable comments and discussions and we summarized what we have done extended analysis and experiments based on feedback.
We would again like to thank reviewer QzyE, and we hope that our changes adequately address all concerns. Please let us know if you have any further questions, and we are very happy to follow up!
Re: Rebuttal
Official Comment by Paper5971 Reviewer QzyE • Re: Rebuttal
Dear Authors,
first, I want to acknowledge that your rebuttal answered my questions and improved the manuscript. However, I dislike that the Authors submitted an unfinished manuscript at first, and then delivered the final paper during the rebuttal period. This is also unfair to other authors who refrained from submitting to ICLR and wait until their manuscript are done.
Submission Withdrawn by the Authors
Withdraw by Paper5971 Authors • Submission Withdrawn by the Authors