#Efficient Dataset Distillation using Random Feature Approximation

Code for the paper *Efficient Dataset Distillation using Random Feature Approximation*

Required packages:
- pytorch
- neural-tangents
- torch_optimizer
- sklearn, matplotlib, numpy, scipy

To run generate a distilled set on cifar10, 10 samples per class, platt loss with label learning, for example:
```python3 run_distillation.py --dataset cifar10 --save_path path/to/directory/ --samples_per_class 10 --platt --learn_labels ```
This does not automatically evaluate the dataset on the test set.

To evaluate a distilled set with NNGP/NTK kernel ridge regression with an already made distilled dataset on all datasets except celebA:
```python3 eval_distilled_set.py --dataset fashion --save_path path/to/directory --run_krr```

To evaluate a distilled set with a finite network trained with SGD on mnist, with an already made distilled dataset:
```python3 eval_distilled_set.py --dataset mnist --save_path path/to/directory --run_finite --lr 1e-3 --weight_decay 1e-3 --label_scale 8` --centering ```
utils.py contains the best hyperparameters for finite network training

To use the empirical NNGP for inference on fashion-mnist:
```python3 run_network_parameter_analysis.py --dataset fashion --save_path path/to/directory```

To use the empirical NNGP for inference on fashion-mnist:
```python3 run_network_parameter_analysis.py --dataset fashion --save_path path/to/directory```

To run the time profiling experiment for model counts of 1,2,4,8, for samples per class in the coreset of 1,5,10,20,50:
```python3 run_time_profile_exp.py --dataset cifar10 --n_models 1 2 4 8 --samples_per_class 1 5 10 20 50```

To run corruption experiments on CelebA with corruption 0.8:
```python3 run_distillation.py --dataset celeba --save_path path/to/directory/ --samples_per_class 1 --platt --n_batches 1 --init_strategy noise --corruption 0.8```
To run CelebA experiments, make sure you are on the latest version of PyTorch, as older version have a bug where the test/train splis are incorrect.

To evaluate with NNGP KRR on CelebA:
```python3 eval_distilled_set_batched.py --dataset celeba --save_path path/to/directory --run_krr```

Interpretability demos are in interpretability.ipynb, but requires a distilled cifar10 dataset with 50images/cls alread made.

We additionally include some distilled dataset for cifar10 with 1,10, or 50 samples per class in ./distilled_images_final/cifar10 in the files 'best.npz'