How did I decide pyramid loss downscale factor and loss scale?

Goal: To find the best loss downscale factor and loss scale. Best = best top 1 val accuracy while having low nesim loss.

Steps:
1. Generate nesim configs with the following possible parameters. `python3 generate_nesim_configs.py`
    
    a. loss downscale factors = `[2,3,5]`
    
    b. loss scales = `[1, 100, 150, 200]`
    
    c. apply every n steps = `[1, 20, 30, 40, 50]`
2. Generate slurm commands `python3 run_all_possible_trainings.py --slurm`
3. Run training.
4. Download results csv from wandb.
5. find the best performing combination.

How do we mathematically define the best possible combination. 2 important factors
1. best val acc (high)
2. nesim loss (low)

Make a heatmap of both.