# <p align=center>`RATV`</p>


Implementation for the paper *Shot Retrieval and Assembly with Text Script for Video Montage Generation* in PyTorch.

## Datasets
We provide a anoymous google driver to download VSPD dataset at https://github.com/RATVDemo/RATV

After downloading the dataset, you need to extract features of videos with CLIP (ViT-B/32) for fast training. The code of CLIP is adapted from https://github.com/openai/CLIP.

## Training

```
python train.py --data_dir $FEATURE_PATH --json_file $TRAIN_JSONL_FILE --epoch $EPOCH
```

## Generation

```
python generate.py --data_dir $FEATURE_PATH --json_file $TEST_JSONL_FILE --transformer_path $CHECKPOINT_PATH --output_dir $OUTPUT_FILE
```

## Evaluation

```
python evaluate.py --fea_root $FEATURE_PATH --target_file $TEST_JSONL_FILE --generated_file $OUTPUT_FILE
```




