# ViSP - Video Summarization Pretraining with Self-Discovery of Informative Frames

The official code of "Video Summarization Pretraining with Self-Discovery of Informative Frames", where we implement ViSP based on the official code of [CSTA](https://github.com/thswodnjs3/CSTA).

You can follow the instructions below to produce our main experiments with just a few lines of command.


## Requirements
```bash
conda create -n ViSP python=3.8.5
conda activate ViSP
cd ViSP
pip install -r requirements.txt
```

## Download Dataset
Download the two benchmark video summarization preprocessed datasets (SumMe, TVSum) from [here (PGL-SUM).](https://github.com/e-apostolidis/PGL-SUM)
The structure of the directory must be like below. <br/>
```
 ├── data
     └── eccv16_dataset_summe_google_pool5.h5
     └── eccv16_dataset_tvsum_google_pool5.h5
```


## Finetune CSTA using provided ViSP checkpoint
```bash
# bash scripts/run_ft_summe_tvsum.sh $gpu_ids $gpu_buffer(MB)
# $gpu_ids defines which GPUs to run experiments in parallel on
# Only run the experiment on a GPU if it has at least $gpu_buffer MB of free memory.
# It is recommended to set $gpu_buffer to 2960 or higher.

bash scripts/run_ft_summe_tvsum.sh "0,1,2" "2960"
```
Results will be saved to `./EXP/summe_tvsum/res_best.txt`


## Run experiments from scratch (pretrain + finetune on parameter space)
```bash
bash scripts/run_summe_tvsum_from_scatch.sh
```
The results will be saved to `./EXP/summe_tvsum/`


## Extra resources

We will release the code and extra resources to support the research community, including more checkpoints and processed out-of-domain pretraining data.