# Quantize-then-Rectify: Efﬁcient VQ-VAE Training

## Requirements
To install requirements:
```bash
# install the packages
pip install -e .
```

## Preparation
### Dataset 
Please download the ImageNet dataset from the official website and place it in the following format:
```
data
└───imagenet
    └───train
        └───n01440764
            └───n01440764_10026.JPEG
            └───n01440764_10027.JPEG
            ...
        └───n01443537
            └───n01443537_1000.JPEG
            ...
    └───val
        └───n01440764
            └───ILSVRC2012_val_00000293.JPEG
            ...
```
### Data Processing Guide
**Step 1: Download Pre-trained DC-AE Model**
Execute the following command to download the pre-trained Deep Convolutional Autoencoder (DC-AE) model:

```bash
HF_ENDPOINT="https://hf-mirror.com/" huggingface-cli download \
  --resume-download mit-han-lab/dc-ae-f32c32-in-1.0 \
  --local-dir ./ckpt/dc_ae
```

Download the DC-AE code:
```bash
git clone https://github.com/mit-han-lab/efficientvit.git
```

**Step2: Download Preprocessor Weights**
Download the preprocessor model file from [here](https://ufile.io/k9461jr1). Save the file as [`preprocessor.pth`](preprocessor.pth) in your working directory.

**Step 3: Convert ImageNet Dataset to Latent Space**
Run the conversion script to transform ImageNet images into 2048-dimensional(32×8×8) latent vectors(Enter the paths for both the dc_ae model and ImageNet dataset):

```bash
python scripts/convert_imagenet.py 
```
Output will be saved in TAR format.

**Step 4: Package Training and Validation Sets**
Process the latent vectors into PyTorch format(Enter the path for the output of Step 2):
```bash
python scripts/save_imagenet.py 
```
Provide separate TAR files for training and validation sets to generate:
- [`imagenet_train.pth`](imagenet_train.pth)
- [`imagenet_val.pth`](imagenet_val.pth) 

**Step 5: Create Subset for Codebook Initialization**
Generate a subset for quantizer initialization(Enter the path for [`imagenet_train.pth`](imagenet_train.pth)):
```bash
python scripts/convert_subset_dataset.py
```
The output [`subset.pth`](subset.pth) will be used for initializing the quantizer's codebook.

## Training
For the 512-token model described in this paper, execute the following training command:

```bash
config_path=configs/512T_NC=16384.yaml
python scripts/train.py --config $config_path --name 512T_NC=16384 --world_size 1 --batch_size 256
```

## Evaluation
**Download Pre-trained Checkpoints**
- [512 tokens with codebook size of 16384](https://ufile.io/qyww5zpo)
- [256 tokens with codebook size of 65536](https://ufile.io/fzxksm8n)
- [256 tokens with codebook size of 262144](https://ufile.io/16aj7u2o)

Setup for 512-token Model (as referenced in the paper):
```bash
# Create directory structure
mkdir -p ./outputs/512T_NC=16384

# Place downloaded files
cp /path/to/512T_NC=16384.pth ./outputs/512T_NC=16384/
cp /path/to/512T_NC=16384.yaml ./outputs/512T_NC=16384/

name=512T_NC=16384
python scripts/eval.py --name $name --config outputs/$name/$name.yaml
```

## Visualization
The visualization pipeline requires the pretrained model checkpoint and its corresponding YAML configuration file. Execute with:
```bash
python scripts/visualize.py
```