# The Many Faces of Optimal Weak-to-Strong Learning

This repository is the official implementation of *The Many Faces of Optimal Weak-to-Strong Learning*.

## Requirements

Running the experiments assumes you have a working installation of python. The versions used for our results are with Python 3.11.1 and pip 24.0, but other version may also work.

To install required python packages:

```setup
pip install -r requirements.txt
```

## Datasets
We have used 5 differents datasets in our experiments.

| Dataset | How to use it |
| ------------------ |---------------- |
| [Higgs](https://doi.org/10.24432/C5V312) | It should be placed in the `training_data` folder and named `higgs.csv`. |
| [Boone](https://doi.org/10.24432/C5QC87) | It should be placed in the `training_data` folder and named `boone.txt`. |
| [Forest Cover](https://doi.org/10.24432/C50K5N) | This is downloaded automatically using `sklearn.datasets`. |
| [Diabetes](https://www.kaggle.com/datasets/mathchi/diabetes-data-set) | It should be placed in the `training_data` folder and named `diabetes.csv`. |
| Adversarial | We generate this dataset ourself. See `DataReadFunctions.py` for exact implementation. Parameters for this dataset are specified in `params.py`. |

## Running the Experiments

The main file to run experiments from is `experiment.py`. To run one of the experiments, you specify which dataset you want to use.
```
python experiment.py <dataset>
```
Here \<dataset\> is one of `boone`, `diabetes`, `forest_cover`, `higgs`, `adversarial`.
The results are saved in the `results` folder (where you also find our results from running the experiments). By default the experiment is repeated 5 times for the random seeds 1-5. If you want to run the experiment only once for a specific seed, just specify --seed, and then only that seed will be run. Example:
```example
python experiment.py diabetes --seed 31
```

## Plotting results

To plot the data for an experiment use the script `plot.py` as follows
```
python plot.py <dataset>
```
It will then look in the `results` folder, and find the json file associated with the name specified in \<dataset\>. Example:
```
python plot.py diabetes31
```
If one has run the experiments for seeds 1-5, you can also plot an average by specifying -a in the command. It will find the json files with the specified name ending with 1-5 and take the average of these results. Example:
```
python plot.py -a diabetes
```
This will take the average of `diabetes1`, `diabetes2`, ..., `diabetes5`.
The average will then be saved in `results/diabetes_avg.json`, which then is the json file that will be used for plotting.

## Results
Below are the plots that we obtained by running an average of 5 on the different datasets.
### Higgs
![Higgs](plots/Higgs.png)
### Boone
![Boone](plots/Boone.png)
### Forest Cover
![Forest Cover](plots/Forest_Cover.png)
### Diabetes
![Diabetes](plots/Diabetes.png)
### Adversarial
![Adversarial](plots/Adversarial.png)


## License
Creative Commons Attribution 4.0