# Dataset Distillation in Large Data Era

## Squeeze ImageNet-21K

We follow the [ImageNet-21K-P](https://github.com/Alibaba-MIIL/ImageNet21K) to train a squeezed model on ImageNet-21K (Winter 2021 version).

## Recover ImageNet-21K

```bash
python data_synthesis_cda_21k.py \
--arch-name "resnet18" \
--arch-path 'path/to/squeezed_model.pth' \
--exp-name "in21k_rn18E80_cda_cos_1_2K_lr_0.05_bn_0.25" \
--syn-data-path './syn-data' \
--batch-size 100 \
--lr 0.05 \
--r-bn 0.25 \
--iteration 2000 \
--store-best-images \
--easy2hard-mode "cosine" --milestone 1 \
--ipc-start 0 --ipc-end 20
```

It will take about 55 hours to recover the subset with IPC20 on 4x4090 GPUs. 

Our distilled ImageNet-21K dataset of 20 IPC, 2K recovery budget are available anonymously at [link](https://drive.google.com/drive/folders/12pC0GDTURdYLThAbVHkTw2lkF2KF_85i?usp=sharing).


## Relabel ImageNet-21K

We follow [SRe<sup>2</sup>L relabeling method](https://github.com/VILA-Lab/SRe2L/tree/main/relabel) and use the above squeezed model to relabel synthetic ImageNet-21K subset.
