# Coreset Augmentation

A Generalization Theory of Data Augmentation for Neural Networks

### Default training script
The default settings in the script trains on full dataset and augments only coresets of size 10\%. It 
can be run using
```
bash train.sh
```

### Datasets and Architectures
Different datasets can be specified using the `--dataset {CIFAR10,SVHN,CIFAR10-IMB}` option.

Different architectures can also be specified using the `--arch {resnet20,resnet32}` are used in the paper.

### Augmentation settings
Augmentation pipelines can be chosen using `--augment_algo`. The ones used in the paper are coreset-uniform, random, largest-ce-loss, same (only applicable when using `--subset_algo random` ). 

Augmentation set size can also be specified using `-as {size}`.

### Subset settings
We used both `--subset_algo all` for training on full dataset, and `--subset_algo {random,coreset-weighted}` for training on subsets.

Subset size (when not training on full dataset) can also be specified using `-s {size}`.

### Other train settings
Other training settings can be viewed using
`python train_resnet.py -h`

Note that only settings mentioned in this README and settings used in the training script `train.sh` are tested and verified. Other settings are either experimental or un-tested.
