# ICLR2025 Submission Code for ViDiT-Q

# Env Setup

We recommend using conda for enviornment management. 

```shell 
cd diffuser-dev

# create a virtual env
conda create -n viditq python=3.10
# activate virtual environment
conda activate viditq

# the xformers (opensora requires) requires torch version of 2.1.1, newest torch is not compatible
conda install pytorch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 pytorch-cuda=12.1 -c pytorch -c nvidia  # pip install also works

pip install -r t2v/requirements_opensora.txt

pip install -r t2v/requirements_qdiff.txt

# install flash attention (optional)
pip install packaging ninja
pip install flash-attn --no-build-isolation

# install xformers
pip3 install xformers==0.0.23

# install the viditq package
# containing our qdiff
pip install -e .

# install opensora
cd t2v
pip install -e .
```

<br>

# Commands to Run

> *After running the following commands, the output (ckpt,generated videos) will appear in the `./logs/`.*

We provide the shell scripts for all process below in `t2i/shell_scripts` and `t2v/shell_scripts`.
For example, run `bash t2v/shell_scripts/get_calib_data.sh $GPU_ID` to generate the calibration dataset.

## 🎬 video generation

### 0.0 Download and convert checkpoint of the STDiT (OpenSORA) model

> Please ref [doc of open-sora v1.0](https://github.com/hpcaitech/Open-Sora) for more details, we only support OpenSORA v1.0 for now, newer versions will be further supported.

- Download the OpenSora-v1-HQ-16x512x512.pth from [this link](https://huggingface.co/hpcai-tech/Open-Sora/blob/main/OpenSora-v1-HQ-16x512x512.pth)

> the original opensora code merges the qkv linears into a linear layer with more channels, we split it into 3 layers for quantization. 

- Put the downloaded OpenSora-v1-HQ-16x512x512.pth in `./logs/split_ckpt`, and run `t2v/scripts/split_ckpt.py`, the converted checkpoint will appear in `./logs/split_ckpt/OpenSora-v1-HQ-16x512x512-split-test.pth'`. 

```shell
python t2v/scripts/split_ckpt.py
```

### 0.1. FP16 inference

- `bash ./t2v/shell_scripts/fp16_inference.sh $GPU_ID`: conducting FP16 inference to generate videos using the 10 opensora example prompt, the video will be saved at `./logs/fp16_inference`. 

> we provide the precomputed `text_embeds.pth` for 10 opensora example prompts in `t2v/util_files`, which help to avoid loading the t5 ckpts onto GPU (which takes around 1 min, and around 10 GBs of memory) . Please add `--precompute_text_embeds ./t2v/utils_files/text_embeds.pth` when running command.

```shell
CFG="./t2v/configs/opensora/inference/16x512x512.py"  # the opensora config
CKPT_PATH="./logs/split_ckpt/OpenSora-v1-HQ-16x512x512-split.pth"  # your path of splited ckpt
OUTDIR="./logs/fp16_inference"  # your_path_to_save_videos
GPU_ID=$1

CUDA_VISIBLE_DEVICES=$GPU_ID python t2v/scripts/inference.py $CFG --ckpt_path $CKPT_PATH  --outdir $OUTDIR \
--precompute_text_embeds ./t2v/utils_files/text_embeds.pth
```

---

### 1.1 Generate calib data

- `bash ./t2v/shell_scripts/get_calib_data.sh $GPU_ID`: generating the calibration data (store the activations) at `$CALIB_DATA_DIR/calib_data.pt` for PTQ. 

```shell
CFG="./t2v/configs/opensora/inference/16x512x512.py" # the opensora config
CKPT_PATH="./logs/split_ckpt/OpenSora-v1-HQ-16x512x512-split.pth"  # splited ckpt
GPU_ID=$1
CALIB_DATA_DIR="./logs/calib_data"  # the path to save your calib dataset

# quant calib data
CUDA_VISIBLE_DEVICES=$GPU_ID python t2v/scripts/get_calib_data.py $CFG --ckpt_path $CKPT_PATH --data_num 10 --outdir $CALIB_DATA_DIR --save_dir $CALIB_DATA_DIR \
--precompute_text_embeds ./t2v/utils_files/text_embeds.pth
```

### 1.2 Post Training Quantization (PTQ) Process

- `bash ./t2v/shell_scripts/ptq.sh $GPU_ID`: conducting the PTQ process based on calib data, generate the quantized checkpoint, remember to modify the names for configs and output log:
    - `CFG`: the configuration for opensora inference (we recommend using the same for calib_data generation, PTQ, and quant infernece)
    - `Q_CFG`: the configurations for quantization, we provide example configs in `./t2v/configs/quant/opensora`
    - `CALIB_DATA_DIR`: the path of calibration data
    - `OUTDIR`: the path of outputs, including quantized checkpoint and copied configs



- the `--part_fp` denotes skip the quantization of a few layers (they only account for a negligible amount of computation (<1%)), the arg is defined in `opensora/utils/config_utils.py`, which reads the `part_fp_list` in quant config (default path is `"./t2v/configs/quant/opensora/remain_fp.txt"`). 

```  shell
EXP_NAME="w8a8_naive"

CFG="./t2v/configs/quant/opensora/16x512x512.py"  # the opensora config
Q_CFG="./t2v/configs/quant/opensora/$EXP_NAME.yaml"  # TODO: the config of PTQ
CKPT_PATH="./logs/split_ckpt/OpenSora-v1-HQ-16x512x512-split.pth"  # splited ckpt generated by split_ckpt.py
CALIB_DATA_DIR="./logs/calib_data"  # your path of calib data
OUTDIR="./logs/$EXP_NAME"  # TODO: your path to save the ptq result
GPU_ID=$1

# ptq
CUDA_VISIBLE_DEVICES=$GPU_ID python t2v/scripts/ptq.py $CFG --ckpt_path $CKPT_PATH --ptq_config $Q_CFG --outdir $OUTDIR \
    --calib_data $CALIB_DATA_DIR/calib_data.pt \
    --part_fp \
    --precompute_text_embeds ./t2v/utils_files/text_embeds.pth

```

### 1.3 Quantized Model Inference

#### 1.3.1 normal quantized inference

- `bash ./t2v/shell_scripts/quant_inference.sh $GPU_ID`: conduct the quantized model inference based on the existing quant config and quantized checkpoint (specified by the `OUTDIR`, which is the output path of the PTQ process). 

```shell
EXP_NAME="w8a8_naive"

CFG="./t2v/configs/quant/opensora/16x512x512.py" # the opensora config
CKPT_PATH="./logs/split_ckpt/OpenSora-v1-HQ-16x512x512-split.pth"  # your path of splited ckpt
OUTDIR="./logs/$EXP_NAME"  # your path of the w8a8 ptq result
GPU_ID=$1
# SAVE_DIR="W8A8_ptq"  # your path to save generated, leave blank to save at $OUTDIR/generated_videos

# quant inference
CUDA_VISIBLE_DEVICES=$GPU_ID python t2v/scripts/quant_txt2video.py $CFG \
    --outdir $OUTDIR --ckpt_path $CKPT_PATH  \
    --dataset_type opensora \
    --part_fp \
    --precompute_text_embeds ./t2v/utils_files/text_embeds.pth \
    # --save_dir $SAVE_DIR \

```

#### 1.3.2 mixed precision quantized inference

- `bash ./t2v/shell_scripts/quant_inference_mp.sh $GPU_ID`: conduct mixed precision quantized model inference based on the existing quant config and quantized checkpoint (specified by the `OUTDIR`, which is the output path of the PTQ process), and the mixed precision configurations `MP_W_CONFIG`, `MP_A_CONFIG` (the bit-width configuration is determined with heuristic decision based on metric-decoupled sensitivity). The code presents the  🔑 **ViDiT-Q W4A8-MP** in our paper. 

- During the PTQ process, quantization parameters for all bitwidth (4,6,8) within the quant config are calculated. Therefore, one could pair the same quantized checkpoint with differnt mixed precision configurations. 

```shell
EXP_NAME='w4a8_timestep_cb'

CFG="./t2v/configs/quant/opensora/16x512x512.py" # the opensora config
CKPT_PATH="./logs/split_ckpt/OpenSora-v1-HQ-16x512x512-split.pth"  # splited ckpt generated by split_ckpt.py
OUTDIR="./logs/$EXP_NAME"  # the path of the result of the W4A8 PTQ
GPU_ID=$1
MP_W_CONFIG="./t2v/configs/quant/W4A8_Naive_Smooth/t20_weight_4_mp.yaml"  # the mixed precision config of weight
MP_A_CONFIG="./t2v/configs/quant/W4A8_Naive_Smooth/t20_act_8_mp.yaml" # the mixed precision config of act
#SAVE_DIR="W4A8_Naive_Smooth_samples"  # leave blank to use the default path $OUTDIR/generated_videos

# quant infer
CUDA_VISIBLE_DEVICES=$GPU_ID python t2v/scripts/quant_txt2video_mp.py $CFG --outdir $OUTDIR --ckpt_path $CKPT_PATH  --dataset_type opensora \
	--part_fp\
	--timestep_wise_mp \
	--time_mp_config_weight $MP_W_CONFIG \
	--time_mp_config_act $MP_A_CONFIG \
	--precompute_text_embeds ./t2v/utils_files/text_embeds.pth \
	#--save_dir $SAVE_DIR
```
