# Dataset Quanlization

## Prepare Data

Put CIFAR10 data to ~/data_cifar.
Put unzipped ImageNet data to ~/data_imagenet.

## Prepare Pretrained MAE Model

prepare the pretrained MAE model for the pixel-level quantizer. 

```
wget https://dl.fbaipublicfiles.com/mae/visualize/mae_visualize_vit_large_ganloss.pth
```

## CIFAR10

```
# Sample-level Quantizer
# Dataset bin generation (By default we use a bin number of 10)
CUDA_VISIBLE_DEVICES=0 python -u sample_quantize.py --fraction 0.1 --dataset CIFAR10 --data_path ~/data_cifar \
    --num_exp 10 --workers 10 -se 0 --selection Submodular --model ResNet18 -sp ./bin_cifar_010 \
    --batch 128 --submodular GraphCut --submodular_greedy NaiveGreedy --pretrained

# Bin sampling (Change the fraction parameter to obtain different data keep ratio)
CUDA_VISIBLE_DEVICES=0 python -u random_sample.py --fraction 0.1 --dataset CIFAR10 --data_path ~/data_cifar \
    --workers 10 --selection_path ./bin_cifar_010/ -sp ./sample_quantized_cifar_010

# Pixel-level Quantizer (Change the mask_ratio parameter to obtain different pixel drop ratio)
CUDA_VISIBLE_DEVICES=0 python -u pixel_quantize.py --data CIFAR10 --data_path ~/data_cifar \
    --output_dir ./pixel_quantized_cifar --model mae_vit_large_patch16 \
    --resume ./mae_visualize_vit_large_ganloss.pth --batch_size 128 \
    --mask_ratio 0.2 --cam_mask
```

## ImageNet

```
# Sample-level Quantizer
# Dataset bin generation (By default we use a bin number of 10)
CUDA_VISIBLE_DEVICES=0 python -u sample_quantize.py --fraction 0.1 --dataset ImageNet --data_path ~/data_imagenet \
    --num_exp 10 --workers 10 -se 0 --selection Submodular --model ViT_Base_16 -sp ./bin_imagenet_010 \
    --batch 128 --submodular GraphCut --submodular_greedy NaiveGreedy --pretrained

# Bin sampling (Change the fraction parameter to obtain different data keep ratio)
CUDA_VISIBLE_DEVICES=0 python -u random_sample.py --fraction 0.1 --dataset ImageNet --data_path ~/data_imagenet \
    --workers 10 --selection_path ./bin_imagenet_010/ -sp ./sample_quantized_imagenet_010

# Pixel-level Quantizer (Change the mask_ratio parameter to obtain different pixel drop ratio)
CUDA_VISIBLE_DEVICES=0 python -u pixel_quantize.py --data ImageNet --data_path ~/data_imagenet/train \
    --output_dir ./pixel_quantized_imagenet --model mae_vit_large_patch16 \
    --resume ./mae_visualize_vit_large_ganloss.pth --batch_size 128 \
    --mask_ratio 0.2 --cam_mask
```

## Experiment Details
Sample-level quantizer first separate the dataset into multiple bins, and then randomly select certain ratio of samples from each bin.
When the data keep ratio is low, because of rounding, the selected sample number may be less than the specified ratio. 
Here we specify the least data keep ratio letting the selected sample number not less than what it should be. 

