# Zero Generalization Error Theorem for Random Interpolators via Algebraic Geometry

This repository provides the experimental code in the paper titled "Zero Generalization Error Theorem for Random Interpolators via Algebraic Geometry".

## Installaion
We assume that `<port1>` on the local machine is forwarded to `<port2>` on the remote server.

On the remote server, put the following commands:
```bash
docker run --gpus all -it --rm -p <port2>:8888 -v ~/random_interpolator:/workspace --name random_interpolator nvcr.io/nvidia/pytorch:22.01-py3
```

## Easy experiment

### Sample random interpolators
We sample random interpolators by Guess & Check algorithm.
Random interpolators for DLNN are sampled by the following command.
```bash
source easy_experiment_scripts/gc_linear.sh
```
Random interpolators for FCDNN are sampled by the following command.
```bash
source easy_experiment_scripts/gc.sh
```

### Calculate the mean and standard deviation
The mean and standard deviation of test losses of random interpolators for each number of training data is calculated by the following command.

For DLNN, run the following.
```bash
source easy_experiment_scripts/summarize_linear.sh
```
For FCDNN, run the following.
```bash
source easy_experiment_scripts/summarize.sh
```

### Visualize the result
You can visualize the result by plot_result.ipynb.
Details are written in the notebook.

## Large experiment

### Sample random interplators
We sample random interpolators by Adam.
Random interpolators are sampled by the following command.
By running the code, MNIST dataset is automatically downloaded in `large_experiment_code/data`
```bash
source large_experiment_scripts/sgd.sh
```

### Estimate the dimension of TES
We estimate the dimension of TES by scikit-dimension library.
For installation, run the following command.
```bash
pip install scikit-dimension
```
Estimating is done on CPU and requires much memory (about 130GB).
For estimating, run the following command.
```bash
source large_experiment_scripts/estimate_dimension.sh
```

### Calculate the mean and standard deviation
The mean and standard deviation of test losses of random interpolators for each number of training data is calculated by the following command.
```bash
source large_experiment_scripts/summarize.sh
```

### Remove ambiguous data
You can remove ambiguous data involved in MNIST dataset in order to reproduce the teacher–student setting on MNIST dataset by remove_ambiguous_data.ipynb, using cleanlab library.
Running this notebook, indices of ambiguous test data of MNIST to be removed are saved in `large_experiment_code/mnist_clean_test`

### Visualize the result
You can visualize the result by plot_result.ipynb.
Details are written in the notebook.