###
This file is destined to help you reproduce the plots in our paper by providing instructions on how to run the scripts provided in the root folder "code".
###

All Python code was tested with Python 3.12.5. The bash scripts were run on a Linux-based operating system.

Required Python packages:
- os
- sys
- random
- time
- math
- numpy
- pandas
- sklearn
- scipy
- matplotlib
- seaborn

In the root folder, we provide 3 Python scripts:

- cv_num_exper.py: main script, computes and saves the quantities of interest from CV runs to be able to generate Figure 1, as well as the KDE plots for a given algorithm and the associated algorithm comparison, also computes and saves the quantities needed to generate MC estimates of sigma^2, loss stability and relative loss stability and thus the rates plots of the paper; takes as inputs the algorithm, the sample size, the number of folds, the replication index, the path to the folder where you want to save the results, a flag to determine if we are using fully dense beta or not, and a flag to determine if we need the configuration for Figure 1 of our paper,

- combine_results.py: performs the combination of the results of cv_num_exper.py for all replications; takes as inputs the algorithm or comparison of a pair of algorithms you want to combine results for, the sample size, the number of replications, and the path to the folder where the results are saved,

- plot_results.py: retrieves the combined results for an algorithm and the associated algorithm comparison for all sample sizes and outputs the plots we provided in our paper; takes as inputs the algorithm, the path to the folder where you saved the results and want to save the plots, a flag to determine if we are using fully dense beta or not, and a flag to determine if we are plotting Figure 1 of our paper.

In order to anonymize our code, we had to remove the bash scripts used to run cv_num_exper.py and combine_results.py on a cluster. This is why we provide 2 bash scripts to run these Python scripts sequentially, but we strongly suggest writing bash scripts to run everything on a cluster (if you have access to one) since running everything sequentially would take much more time.

With the files provided, you can recover the same plots as those we put in our main paper by consecutively running in a new Terminal (reminder: we strongly suggest to replace the current bash scripts by bash scripts which allow you to use a cluster if you have access to one):

```
cd <path_to_folder_containing_our_scripts>
module load <my_Anaconda_installation> # note: this might not be required if it is already loaded by default when you open a new terminal; an easy way to check is to try running "conda info --envs"
conda activate <my_environment> # where you installed the required packages listed earlier
./runExper.sh Lasso 500 <path_to_res_1> 0 1 # run to generate results for Figure 1 of our paper
./runExper.sh ST 500 <path_to_res_2> 0 0
./runExper.sh Lasso 5000 <path_to_res_3> 0 0
./runExper.sh Ridge 500 <path_to_res_4> 0 0
./runExper.sh ST 500 <path_to_res_5> 1 0 # run ST with fully dense beta
./combine.sh Lasso 500 <path_to_res_1>
./combine.sh ST 500 <path_to_res_2>
./combine.sh Lasso 5000 <path_to_res_3>
./combine.sh Ridge 500 <path_to_res_4>
./combine.sh ST 500 <path_to_res_5>
python plot_results.py Lasso <path_to_res_1> 0 1
python plot_results.py ST <path_to_res_2> 0 0
python plot_results.py Lasso <path_to_res_3> 0 0
python plot_results.py Ridge <path_to_res_4> 0 0
python plot_results.py ST <path_to_res_5> 1 0
```

Note: Make sure to end with "/" all the paths you input.

You will find the figures from the main paper in the "figures" folder in the directory <path_to_res_i> leads to (for i = 1, 2, 3, 4, 5).

Running plot_results.py should only take a few seconds.

Running cv_num_exper.py takes several minutes, for experiments not using cross-validated lambda, (let's say 4-5 minutes, should be less for sample size 100 but up to 30 minutes for sample size 100000) so 500 replications would take more than a day. For all sample sizes, it would then take several days. The simulations for Lasso with cross-validated lambda have a much longer running time than the rest. When split into 5000 replications, each still takes 10-15 hours. This is why running on a cluster is crucial.

Running combine_results.py also takes up to several minutes but there are fewer calls to this script. However, we still recommend using a cluster for running this script as well.

Note: the operations taking a lot of time are the repeated fitting procedures on the algorithms in cv_num_exper.py, especially for the selection of lambda by inner cross-validation for the simulations ran with Lasso. Comparatively, the combination of results in combine_results.py, even for many replications, and the plotting in plot_results.py take no time at all (a few minutes overall for the entirety of the experiments).

You will only need a few GB to store results. 16 GB of RAM is more than enough to run each job.
