Given a folder dataset structured as:

dataset_root/cls_0/ , dataset_root/cls_1/,  ....

The following command generates patch sizes with different dimensions using a teacher model:

python synthesize.py --subset imagenet-woof --factor_in_orig_img 2 \
--model_ckpt <teacher_path> --num_crop 20 \
--collage_save_dir <patch_save_dir> --train_dir <dataset_root>

--factor_in_orig_img is the variable r introduced in the paper, defined as image_size / patch_size.
For ImageNet-level datasets with a resolution of 224x224, to generate 112x112 patches, factor_in_orig_img should be set to 2.

--model_ckpt specifies the path to the teacher model.
--collage_save_dir sets the directory for saving the patches.
--train_dir is the root directory for the original training data.

To generate soft labels for communication to the student side, run the following command:

accelerate launch --num_processes 4 generate_soft_lbls.py --subset imagenet-woof \
--extra_desc <description> --model_ckpt <teacher_path> --batch_size 32 --epochs 300 \
--workers 8   --val_dir <....> --syn_data_path <....> --exp_num 1 --soft_lbl --ipc -1 \
--temp 5 --dataset_name_dict ./dataset_dicts/<...>  --aug_drop_p 0.3  --rnd_res_scale_st 0.8 --strength .8 --seed 0

The above command parallelizes the soft label generation process. Each training batch is divided into multiple processes,
 with a turbo diffusion model processing each branch in parallel. After synthesis, the data is gathered for soft-labeling.

--extra_desc appends an additional description to the name of the soft label file.
--val_dir specifies the parent folder for validation files:
dataset_root/val/cls_0, dataset_root/val/cls_1, ....

--syn_data_path is the path to the root directory of the patches.
--dataset_name_dict provides a mapping of folder names to corresponding class names in English. 
For ImageNet-1k and its subsets (ImageWoof, ImageNette, Tiny), the dictionary is located in the dataset_dicts directory.

--temp sets the temperature for knowledge distillation.

To train a student model using the saved soft label file, use:

accelerate launch --num_processes 4 train_with_from_lbl_files.py --subset imagenet-woof \
--extra_desc <...> --model_ckpt <....> --batch_size 32 --epochs 300 --workers 8  \
--val_dir <....> --syn_data_path <....> --exp_num 1 --soft_lbl --ipc -1  --temp 5 \ 
--dataset_name_dict ./dataset_dicts/<...>  --aug_drop_p 0.3  --rnd_res_scale_st 0.8 \
--strength .8 --seed 0 --lbl_file <path_to_softlbl>

Ensure the seed and temp match the ones used during soft label generation to correctly align data and labels by replicating the same training procedure.

Alternatively, to bypass the soft label generation process and directly perform knowledge distillation, run:

accelerate launch --num_processes 4 train_with_superres.py --subset imagenet-woof \
--extra_desc <...> --model_ckpt <...> --batch_size 32 --epochs 300 --workers 8   \ 
--val_dir <...> --syn_data_path <...> --exp_num 1 --soft_lbl --ipc -1  --lr 0.001 --wd 0.01 \
--temp 5 --dataset_name_dict ./dataset_dicts/<...>  --aug_drop_p 0.3 \
 --rnd_res_scale_st 0.8 --strength .8 --seed 0

This will directly load the teacher on the student side and perform the knowledge distillation.
