Instructions to reproduce results.

All commands are run from the home directory.

The first step is to set up an environment with all the libraries installed.
This can be done using the command:

    conda create --name afa_env --file requirements.txt


Following this, the datasets must be created. This can be done with the
following commands:

    python -m datasets.create_bank
    python -m datasets.create_california_housing
    python -m datasets.create_cube
    python -m datasets.create_fashion_mnist
    python -m datasets.create_invase
    python -m datasets.create_metabric
    python -m datasets.create_miniboone
    python -m datasets.create_mnist
    python -m datasets.create_tcga

Please note that for Bank Marketing, METABRIC, MiniBooNE and TCGA the data
must first be downloaded. TCGA requires credentialed access. All other datasets
can be downloaded at:

- Bank Marketing: https://archive.ics.uci.edu/dataset/222/bank+marketing
- METABRIC: https://www.kaggle.com/datasets/raghadalharbi/breast-cancer-gene-expression-profiles-metabric
- MiniBooNE: https://archive.ics.uci.edu/dataset/199/miniboone+particle+identification


NOTE: the final hyperparameters have been included in this code, so the sweeping step
can be skipped.
Once the datasets have been created, to run hyperparameter sweeps use the following:

    python -m experiments.launch_sweeps --dataset <dataset> --model <model> --configs <configs>

The dataset argument is one of the above datasets, the names are:

- bank
- california_housing
- cube
- fashion_mnist
- invase_4
- invase_5
- invase_6
- metabric
- miniboone
- mnist
- TCGA

Where invase_4, invase_5 and invase_6 refer to Synthetic 1, 2 and 3. The model argument is which model to use, the names are:

- dime
- eddi
- fixed_mlp
- gdfs
- opportunistic
- ours
- vae

The configs refers to which configurations to try, these are in groups of three
given by:

- 123
- 456
- 789


After running sweeps, the sweep results can be found in tuning/results. All results
can be found in their text files, the best result can be copied and pasted into the
experiments/hyperparameters_dict.py file. Following this, to train the main
models we use the command:

    python -m experiments.launch_runs --dataset <dataset> --model <model>


After all models are trained on a dataset, the main results can be extracted 
with the command:

    python -m experiments.extract_main_results --dataset <dataset>

This will construct the main results as a dictionary for the given dataset and save
them at experiments/results/<dataset>.pt



To run ablations we can use the command:

    python -m experiments.launch_ablations --dataset <dataset> --ablation <ablation>

The ablation command is what type of ablation we run, the options are:

- ib
- train_sample

Note that to run ablations for one acquisition step and probability weighting the code must be changed 
manually, this is not handled directly by the ablation program. This change will be fixed in the camera
ready version of the code.

Finally, to extract the ablation results run:

    python -m experiments.extract_ablation_results --dataset <dataset>

This will create a dictionary with the ablation results saved at experiments/results/ablations/<dataset>.pt