1. Data Setup

The code is based on BaSSL and TranS4mer.
Follow BaSSL for data download
(BaSSL - https://github.com/kakaobrain/bassl)
(TranS4mer - https://github.com/md-mohaiminul/TranS4mer)

2. Environmental Setup

pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt

3. Train(Pretraining and Finetuning)

We use 8 A100 GPUs for pretraining.
We recommend using Docker containers with the following volume mount configuration:
- Data should be mounted to /dev/shm (shared memory)
- Source code should be mounted to /workspace
- Example : docker run --rm -it --env NVIDIA_VISIBLE_DEVICES=8 --runtime=nvidia -v (DATA_DIR):/dev/shm -v (WORK_DIR):/workspace --ipc=host /bin/bash

Due to file size limitations, we cannot share the trained checkpoints. However, you can reproduce the performance reported in the paper by following the scripts provided below:

3.1. Pretraining

bash script/pretrain.sh

3.2. Extracting shot-level features

bash script/extract.sh

3.3. Finetuning

bash script/finetune.sh