# ProSAR: Prototype-Guided Semantic Augmentation and Refinement for Time Series Contrastive Learning



In this paper, we propose ProSAR, a novel prototype-guided semantic augmentation and refinement framework for time series contrastive learning. ProSAR's approach is founded on an information-theoretic principle for co-designing semantic data augmentations and learnable prototypes. It aims to generate views that maximize information about an associated semantic prototype while discarding prototype-irrelevant nuisance variability. The proposed approach can be integrated with different backbone encoders.

## Requirements
We use Python 3.9. The main packages include:
* numpy 
* scikit-learn 
* torch 
* pandas 
* tqdm (commonly used)
* matplotlib (commonly used)

A `requirements.txt` file is also provided.

## Dataset
The framework is evaluated on several benchmark datasets:
1.  **Forecasting:**
    * ETT dataset (ETTh1, ETTh2, ETTm1) ([link](https://github.com/zhouhaoyi/ETDataset))
    * Electricity dataset ([link](https://archive.ics.uci.edu/dataset/321/electricityloaddiagrams20112014), preprocessing often follows [TS2Vec's script](https://github.com/zhihanyue/ts2vec/blob/main/datasets/preprocess_electricity.py))
    * Weather dataset  (often sourced from [Informer](https://github.com/zhouhaoyi/Informer2020))
2.  **Classification:**
    * UEA multivariate time series archive (30 datasets) 
For public datasets, you can also find sources in repositories like [TS2Vec](https://github.com/yuezhihan/ts2vec) and [CoST](https://github.com/salesforce/CoST).

## Usage
Below is an example command to run ProSAR for forecasting tasks. Please adapt parameters as needed.
```commandline
python train.py \
--dataset <dataset_name> \
--gpu <gpu_id> \
--seed <random_seed> \
--archive <output_archive_path> \
--load_default <True/False> \
--batch-size <N> \
--lr <encoder_learning_rate> \
--epochs <training_epochs> \

