The MLP classes are located in the MLP classes code: mlp_models_multilayer.py
The Transformer classes are located in the Transformer training code: transformer_train_get_data_r2_heatmap_attn=1_top-k_layer_all.py

MLP args are train_mlp_multilayer.py 0.001 0.0001 59 59 adam 2500 50 random_random 512 one_embed 128 1 1
Transformer args for transformer_train_get_data_r2_heatmap_attn=1_top-k_layer_all.py 0.001 0.00075 59 59 adam 3000 50 random_random 10 1 3072 0.0 0.0 1 1

requirements-cuda-12.txt has every pip install needed

Code was run with RTX8000 GPUs with the following installed modules:
Modules: 1) openmpi/4.0.4   2) cudatoolkit/12.2.2   3) libffi/3.2.1   4) libreadline/7.0   5) python/3.10

After getting the code to run, set the paths appropriately and run all the hypertuning scripts to generate all the data that will be plotted. 
WARNING: generating all of the data to remake every plot in the paper is a around a Terabyte as over 250k neural networks are trained across all plots in the paper.
Suggestion: update all paths in training files to save data on a network drive.