# PyTorch Code for Singularity

This code accompanies the singularity project. It is only intended to be used in NeurIPS 2022 review process.

## Setup

### Conda Env

The specific packages used in our experiment are detailed in [environment.yml](environment.yml), you can easily create a conda env containing these packages:
```bash
conda env create -f environment.yml
```
This creates a conda env `sl`, activite it with
```bash
conda activate sl
```

### Env Variables

In your .bashrc file, set the environment variables:
```bash
export SL_EXP_DIR="/path/to/ckpts_and_logs"
export SL_DATA_DIR="/path/to/data"
```
These variables are accessed by the yaml files in the [configs/](configs) directory and the shell scripts in [scripts/](scripts).

### Logging

Our codebase support using [wandb](https://wandb.ai/) to monitor training. If you want to use wandb, you will need to set up it following [this very short instruction](https://docs.wandb.ai/quickstart#1.-set-up-wandb), and also set `wandb.enable` in the [configs](configs) to be `True`.


## Pre-Training
Launch pre-training with the following command:
```bash
bash scripts/pretrain.sh EXP_NAME CORPUS NGPU local 
```
`EXP_NAME` indicates the name of the current run. `CORPUS` is the name of the dataset used for training, check [configs/pretrain.yaml](configs/pretrain.yaml) for available corpus. `NGPU` is the number of GPUs to use. The last parameter `local` specifies the program will be running on a local machine, instead of a slurm managed cluster. For training on WebVid and CC3M datasets, with 3 GPUs, run
```bash
bash scripts/pretrain.sh first_run webvid_cc3m 3 local
```
You can also change the other configs in [configs/pretrain.yaml](configs/pretrain.yaml). For example, you can append `wandb.enable=True` to enable logging with wandb:
```bash
bash scripts/pretrain.sh first_run webvid_cc3m 3 local wandb.enable=True
```
If you are using slurm, simply replace `bash` with `sbatch`, and `local` with `slurm`:
```bash
sbatch scripts/pretrain.sh EXP_NAME CORPUS NGPU slurm
```
However, note that you may need to change `#SBATCH` configs in [scripts/pretrain.sh](scripts/pretrain.sh) for your specific slurm cluster, e.g., `--partition`. Also, the `NGPU` argument will be ignored, you need to specify #gpus using `#SBATCH` in the script.


## Fine-Tuning
### Retrieval
Launch fine-tuning for text-to-video retrieval with the following command:
```bash
bash scripts/train_ret.sh EXP_NAME DATASET NGPU local \
 pretrained_path=PT_CKPT_PATH
```
`EXP_NAME` is the name of the training. `DATASET` indicates the dataset to use, it can be one of `[msrvtt, didemo, anet, ssv2_label, ssv2_template]`. `PT_CKPT_PATH` is the path to the pre-trained checkpoint file. If `msrvtt` is used, the script will fine-tune the model using the config file here [configs/ret_msrvtt.yaml](configs/ret_msrvtt.yaml). 
```bash
bash scripts/train_ret.sh ft_msrvtt msrvtt 1 local \
 pretrained_path=PT_CKPT_PATH 
```
Besides, if you want to use a different value than the default values in this config file, you may append options to the command above. For example, fine-tuning `msrvtt` with 4 frames per video, and use a 2-layer temporal encoder (this is the `Singularity-temporal` model): 
```bash
bash scripts/train_ret.sh ft_msrvtt_4frm_2tlayer msrvtt 1 local \
 pretrained_path=PT_CKPT_PATH \
 video_input.num_frames=4 \
 add_temporal_embed=True \
 temporal_vision_encoder.enable=True \
 temporal_vision_encoder.num_layers=2
```
Similar to pre-training, you can run this script on slurm, simply replacing `bash` with `sbatch`, `local` with `slurm`.


### Question Answering
Launch fine-tuning for video question answering with the following command:
```bash
bash scripts/train_vqa.sh EXP_NAME DATASET NGPU local \
 pretrained_path=PT_CKPT_PATH
```
`DATASET` can be one of `[msrvtt, anet]`. This script also supports slurm.


## Evaluation
For retrieval, run
```bash
bash scripts/eval_ret.sh DATASET CKPT_PATH SAVE_DIRNAME local NGPU 
```
`DATASET` is the name of one of the retrieval datasets, `CKPT_PATH` can be a path to fine-tuned checkpoint, or pre-trained checkpoint. In the later case, it evaluates zero-shot performance. `SAVE_DIRNAME` is a string name of the directory where the evaluation results will be saved. To evaluate `didemo` zero-shot performance on both `val` and `test` splits, with 12 inference frames, one can use 
```bash
bash scripts/eval_ret.sh didemo /path/to/pt_ckpt.pth eval_12frm local 1 \
 test_types=[val,test] video_input.num_frames_test=12
```

For question answering, run
```bash
bash scripts/eval_qa.sh DATASET CKPT_PATH SAVE_DIRNAME local NGPU 
```
