# VFRTok: Variable Frame Rates Video Tokenizer with Duration-Proportional Information Assumption

## Setup
```bash
conda create -n vfrtok python=3.10
conda activate vfrtok
pip install -r requirements.txt
```

## Preprocessing
Organize the data into `csv`

For example:
```csv
video_path,height,width,fps
FIRST_VIDEO.mp4,1080,1920,24.0
SECOND_VIDEO.mp4,540,960,30.0
...
```

## Train
First set the `csv` path into the `yaml` config file
```yaml
data_path_list: [
  YOUR_DATA_DIR/K600.csv,
  YOUR_DATA_DIR/K600.csv,
  YOUR_DATA_DIR/BVI_HFR.csv,
  YOUR_DATA_DIR/BVI_HFR.csv,
]
val_data_path_list: [
  YOUR_DATA_DIR/YOUR_VAL_DATA.csv
]
```
Run training script:
```bash
deepspeed train.py \
    --config configs/VFRTok-M.yaml \
    --ds-config configs/ds_config_bs8x1.json
```

## Inference
First convert the checkpoint format:
```bash
python zero_to_fp32.py -i exp000 -t 200000
```
Run inference script:
```bash
python recon.py \
    --config experiments/exp000-VFRTok-M-ds_config_bs8x1/config.yaml \
    --ckpt experiments/exp000-VFRTok-M-ds_config_bs8x1/vq_200000/pytorch_model.bin \
    --data_path $YOUR_DATA_PATH \
    --save_path output
```

VFRTok supports asymmetric reconstruction, e.g. Video Frame Interpolation:
```bash
python recon.py \
    --config experiments/exp000-VFRTok-M-ds_config_bs8x1/config.yaml \
    --ckpt experiments/exp000-VFRTok-M-ds_config_bs8x1/vq_200000/pytorch_model.bin \
    --data_path $YOUR_DATA_PATH \
    --save_path output \
    --enc_fps 12 --dec_fps 120
```