## Evaluation Scripts

This folder contains scripts that can be used to reproduce the first experiment in section 5 of the paper, where pretrained Chronos and T5 models are compressed and evaluated.

### Compute the perplexity and Jaccard overlaps of compressed T5

The experiment to compare the perplexity and Jaccard overlaps of compressed T5 models can be run using the following command:

```
python compress_T5.py \
   --num-sentences 1000 \
   --batch-size 8 \
   --topk 10
```

Here, `topk` is the index to be used in computing the Jaccard overlaps, and `num-sentences` is the number of evaluation sentences over which the metrics will be averaged. The results will be saved to the [`results`](./results/) directory.

### Compute the Jaccard overlaps of compressed T5

The experiment to compare the Jaccard overlaps of compressed T5 models can be run using the following command:

```
python compress_chronos.py \
   --dataset electricity_15min \
   --num-series 1000 \
   --series-len 512
```

Here, `topk` is the index to be used in computing the Jaccard overlaps, `num-series` is the number of evaluation time-series over which the metric will be averaged, and `dataset` is the autogluon dataset from which the time-series are sampled from. The available datasets can be found on [huggingface](https://huggingface.co/datasets/autogluon/chronos_datasets). The results will be saved to the [`results`](./results/) directory.

### Evaluate the In-Domain and Zero-Shot Performance of Compressed Chronos Models

To compute the in-domain and zero-shot WQL and MASE of the compressed Chronos models, one needs to first download the [`public chronos-forecasting repository`](https://github.com/amazon-science/chronos-forecasting/tree/main). Replace the file `chronos-forecasting/scripts/evaluation/evaluate.py` with [`evaluate.py`](./evaluate.py) in this directory. Then, one can evaluate the compressed pretrained Chronos models by calling

```
python evaluation/evaluate.py evaluation/configs/in-domain.yaml evaluation/results/chronos-t5-small-in-domain.csv \
    --chronos-model-id "amazon/chronos-t5-small" \
    --batch-size=32 \
    --device=cuda:0 \
    --num-samples 20 \
    --epsilon=0.01
```

and

```
python evaluation/evaluate.py evaluation/configs/zero-shot.yaml evaluation/results/chronos-t5-small-zero-shot.csv \
    --chronos-model-id "amazon/chronos-t5-small" \
    --batch-size=32 \
    --device=cuda:0 \
    --num-samples 20 \
    --epsilon=0.01
```

Here, `epsilon` is the threshold that we use to truncate the singular values. See section 5 of the paper for more details.