# Masked Dual-Temporal Autoencoders for Semi-Supervised Time-Series Classification

📝 This repository is the official implementation of **Masked Dual-Temporal Autoencoders for Semi-Supervised Time-Series Classification**. 

> Contributions
* We propose a novel masked time-series modeling-based framework for semi-supervised time-series classification. To our knowledge, this work is the first exploration of masked time-series modeling for this purpose.
* To effectively capture intricate temporal patterns within time series across idverse temporal resolutions, we develop a *dual-temporal encoder* comprising two sequential sub-encoders. In addition, we solve the potential information loss problem between the sub-encoders by introducing a *relation-preserving* loss function.
* We use random masking ratios at each training epoch to avoid the high-cost tuning process for exploring otimal masking ratios laong with enhancing classification performance.
* The proposed method captures the inherent temporal information of time series and successuflly incorporates them with supervisory features, achieving outstanding performance in semi-supervised time-series classification compared to SOTAs.

> Illustration of overview of MDTA

![overview](./figs/overview.png)

> Illustration of multi-resolution and transformer-based sub-encoders

![multi-resolution sub-encoder](./figs/auxiliary_sub.png) |![transformer-based sub-encoder](./figs/trans_sub.png)
--- | --- | 

## Requirements

💡 Our code requires *Python*, *PyTorch*, *Scikit-Learn*, *NumPy*, *Pandas*, and *Findiff*.

> Test with:
- python == 3.8.13
- PyTorch == 1.13.0
- scikit-learn == 0.24.2
- numpy == 1.22.2
- pandas == 1.4.4
- findiff == 0.10.0

📁 You can download the datasets and put the datasets into `datasets/` folder in the following way:
- [UCR Time Series Classification Archive](https://www.cs.ucr.edu/~eamonn/time_series_data_2018/) should be put into `datasets/UCRArchive_2018/`. For example, each data file is located by `datasets/UCRArchive_2018/<dataset_name>/<dataset_name>_*.tsv`.

## Training and Evaluation

💻 To train and evaluate MDTA in the paper on a dataset, run the following command:
```train
python main.py --seed <seed> --dataset <dataset_name> --label_ratio <label_ratio>
```

* For example, you can obtain the trained model and classification results on 'CBF' dataset as:
```train
python main.py --seed 42 --dataset CBF --label_ratio 0.1
```

📚 The detailed descriptions about the arguments are as following:

| Parameter             | Description  |
| --------------------- |------------- |
| seed                  | The random seed (defaults to 42) | 
| dataset               | The dataset name (defaults to 'CBF')|
| label_ratio           | The label ratio (defaults to 0.1) |
| max_len               | The maximum lenght of the incoming sequence (defaults to 1024) |
| embed_dim             | The dimension of representation obtained from dual-temporal encoder (defaults to 64) |
| depth                 | The depths in multi-resolution sub-encoder |
| hidden_dim            | The dimension of hidden layers used in MDTA (defaults to 256) |
| dropout               | The dropout value (defaults to 0.1) |
| num_head              | The number of heads in the multi-head attention of transformer-based sub-encoder (defatuls to 8) | 
| num_layer             | The number of transformer blocks in transformer-based sub-encoder |
| lr                    | The learning rate (defaults to 1e-3) |
| batch_size            | The batch size (defaults to 10) |
| epoch                 | The maximum training epochs (defaults to 1000) |
| patience              | The patience epoch for early stopping (defaults to 50) |
| gpu                   | The gpu no. used for training and inference (defaults to 0) |


*For descriptions of these arguments, run ```python main.py -h```.*

## Results

📋 MDTA achieved the average classification performance across label ratios ranging from 0.1 to 0.9 under *inductive* inference as:

| Dataset | CE | Pseudo | $\Pi$-model | FixMatch | MTL | SSTSC | iTimes | MDTA |
| ------- | -- | ------ | ----------- | -------- | --- | ----- | ------ | ---- |
| CBF | 99.20 | 99.30 | 99.26 | 99.40 | 98.38 | 99.38 | 99.44 | **99.71** |
| CricketX | 52.26 | 52.62 | 61.92 | 59.66 | 40.38 | 41.89 | **67.04** | 64.07 |
| ECGFiveDays | 98.49 | 98.51 | 83.44 | 83.33 | 98.39 | 98.24 | 95.33 | **99.81** |
| Lightning2 | 67.56 | 68.89 | 68.98 | 69.87 | 67.56 | 67.64 | 71.56 | **75.23** |
| MoteStrain | 93.29 | 93.46 | 94.15 | 94.19 | 89.17 | 92.31 | 93.56 | **95.03** |
| Plane | 96.98 | **96.98** | 85.98 | 88.73 | 84.87 | 94.50 | 85.66 | 96.88 |
| PowerCons | 89.41 | 88.89 | 85.37 | 86.23 | 87.84 | 89.07 | 87.78 | **93.58** |
| RefrigerationDevices | 58.60 | 57.61 | 57.04 | 57.32 | 57.51 | 58.19 | **60.64** | 59.56 |
| SonyAIBORobotSurface1 | 97.35 | 97.21 | 93.44 | 94.22 | 93.28 | 96.80 | 94.20 | **99.61** |
| SwedishLeaf | 84.16 | 84.83 | 70.34 | 69.66 | 55.45 | 76.93 | 56.18 | **86.18** |
| SyntheticControl | 96.19 | 97.70 | 97.98 | 97.52 | 96.98 | 93.78 | 97.31 | **98.11** |
| ToeSegmentation1 | 84.16 | 84.44 | 84.73 | 84.36 | 82.30 | 82.39 | 86.71 | **94.03** |
| Trace | 91.94 | 93.22 | 91.72 | 92.72 | 91.50 | 91.44 | 95.78 | **98.44** |
| TwoPatterns | 99.81 | 99.85 | 99.30 | 99.48 | 98.91 | 99.73 | 96.86 | **99.97** |
| Yoga | 83.99 | 83.98 | 64.72 | 63.85 | 74.76 | 80.58 | 75.92 | **85.17** |

📊 MDTA exhibited superior performance *even when dealing with limited labeled data*. For example, as shown below, MDTA improved classification performance by more than 4% in four datasets than the second-best scores *when the label ratio is 0.1*. 
|SonyAIBORobotSurface1|&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;SwedishLeaf&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;ToeSegmentation1&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Trace&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;|
| :---: | :---: | :---: | :---: |
|![example1](./figs/label_0.1_SonyAIBORobotSurface1.png) |![example2](./figs/label_0.1_SwedishLeaf.png) |![example3](./figs/label_0.1_ToeSegmentation1.png) |![example4](./figs/label_0.1_Trace.png) |


📇 As the representations of each class pass through each sub-encoder, they form gradually more distinct groups for each class. Through this analysis, we can reaffirm the effectiveness of MDTA, especially for the dual-temporal encoder architecture and relation-preserving loss function. 
  - MoteStrain
![motestrain](./figs/umap_MoteStrain_scatter_WAAE.png)  
  - SwedishLeaf
![motestrain](./figs/umap_SwedishLeaf_scatter_WAAE.png)  
  - TwoPatterns
![example1](./figs/umap_TwoPatterns_scatter_WAAE.png)
  - Yoga
![example1](./figs/umap_Yoga_scatter_WAAE.png)



## Contributing
1. The transformer architecture used in MDTA was implemented based on the official code of [TST](https://github.com/gzerveas/mvts_transformer).
2. The loss weighting strategy was implemented with the official code of [SoftAdapt](https://github.com/dr-aheydari/SoftAdapt).