folder;label_text;label_latex;description
random_shooting_n100_h10;RS(n=100, h=10); $\text{RS}(n=100, h=10)$;random shooting with n=100 rollouts and horizon of h=10
random_shooting_n100000_h20;RS(n=100K, h=20); $\text{RS}(n=100\text{K}, h=20)$;random shooting with n=100K rollouts and horizon of h=20
DQN;DQN;DQN;DQN agent
DQN_planning_n100_h10;DQN planning(n=100, h=10); DQN planning$(n=100, h=10)$;heated DQN agent used to guide planning with n=100 rollouts and h=10 horizon
PPO;PPO;PPO;PPO agent
PPO_planning_n100_h10;PPO planning(n=100, h=10); DQN planning$(n=100, h=10)$;heated DQN agent used to guide planning with n=100 rollouts and h=10 horizon
map_elites_nn;MAP-Elites (ME);QD(alg=ME, contr=NN,h=10, N=100); dyna-style QD agent based on MapElites with NN controllers
map_elites_as;QD(alg=ME, contr=AS);QD(alg=ME, contr=AS); dyna-style QD agent based on MapElites with action sequence controllers
nslc_nn;QD(alg=NSLC, contr=NN);QD(alg=NSLC, contr=NN); dyna-style QD agent based on NSLC with NN controllers
nslc_as;QD(alg=NSLC, contr=AS);QD(alg=NSLC, contr=AS); dyna-style QD agent based on NSLC with action sequence controllers
map_elites_nn_n100;QD(alg=ME, contr=NN, N=100);QD(alg=ME, contr=NN, N=100);dyna-style QD agent based on MapElites with NN controllers. N=100
random_shooting;Random Shooting (RS); $\text{RS}(n=100, h=10)$;random shooting with n=500 rollouts and horizon of h=30
map_elites_nn_h10;QD(alg=ME, contr=NN, N=100, h=10);QD(alg=ME, contr=NN, N=100, h=10);dyna-style QD agent based on MapElites with NN controllers.
map_elites_as_h10;QD(alg=ME, contr=AS, N=100, h=10);QD(alg=ME, contr=AS, N=100, h=10);dyna-style QD agent based on MapElites with AS controllers.
map_elites_as_h30;QD(alg=ME, contr=AS, N=100, h=30);QD(alg=ME, contr=AS, N=100, h=30);dyna-style QD agent based on MapElites with AS controllers.
map_elites_nn_h30;QD(alg=ME, contr=NN, h=30, N=100);QD(alg=ME, contr=NN,h=30, N=100); dyna-style QD agent based on MapElites with NN controllers
me_DQN;QD(alg=ME, contr=NN, DQN);QD(alg=ME, contr=NN, DQN);dyna-style QD agent based on MapElites with NN controllers. N=100
map_elites_nn_dqn_exp;QD(alg=ME, contr=NN, DQN EXP);QD(alg=ME, contr=NN, DQN EXP);dyna-style QD agent based on MapElites with NN controllers. N=100
map_elites_nn_ppo;QD(alg=ME, contr=NN, PPO);QD(alg=ME, contr=NN, PPO);dyna-style QD agent based on MapElites with NN controllers. N=100
SAC_planning;SAC planning;SAC planning;SAC planning
SAC;SAC;SAC;SAC
me_SAC;QD(alg=ME, contr=NN, SAC);QD(alg=ME, contr=NN, SAC);dyna-style QD agent based on MapElites with NN controllers. N=100
me_SAC_EXP;QD(alg=ME, contr=NN, SAC_EXP);QD(alg=ME, contr=NN, SAC_EXP);dyna-style QD agent based on MapElites with NN controllers. N=100
me_SAC_EXP_all_traces;QD(alg=ME, contr=NN, SAC_EXP with all traces);QD(alg=ME, contr=NN, SAC_EXP with all traces);dyna-style QD agent based on MapElites with NN controllers. N=100
me_SAC_EXP_no_sim;QD(alg=ME, contr=NN, SAC_EXP no sim);QD(alg=ME, contr=NN, SAC_EXP no sim);dyna-style QD agent based on MapElites with NN controllers. N=100
me_SAC_bootstrap_min;QD(alg=ME, contr=NN, SAC bt min);QD(alg=ME, contr=NN, SAC bt min);dyna-style QD agent based on MapElites with NN controllers. N=100
me_SAC_bootstrap_max;QD(alg=ME, contr=NN, SAC bt max);QD(alg=ME, contr=NN, SAC bt max);dyna-style QD agent based on MapElites with NN controllers. N=100
random_shooting_safe;Safe Random Shooting (S-RS); $\text{RS SAFE}(n=100, h=10)$;SAFE random shooting with n=100 rollouts and horizon of h=10
random_shooting_for_real_system;RS(n=100, h=10); $\text{RS}(n=100, h=10)$;random shooting with n=100 rollouts and horizon of h=10
cem;Cross Entropy Method (CEM); $\text{CEM}$;Cross entropy method
rcem;Robust Cross Entropy Method (RCEM); $\text{RCEM}$;Robust Cross entropy method
map_elites_nn_safe_pareto;Pareto Safe MAP-Elites (PS-ME);QD(alg=ME, contr=NN,h=10, N=100); dyna-style QD agent based on MapElites with NN controllers
map_elites_nn_safe;Safe MAP-Elites (S-ME); dyna-style QD agent based on MapElites with NN controllers
cpo;CPO; $\text{CEM}$;Constrained Policy optimization
ppo;PPO; $\text{CEM}$;Proximal Policy optimization
trpo;TRPO; $\text{CEM}$;Trust Policy optimization
ppo_lagrangian;PPO with lagrangian; $\text{CEM}$;Proximal Policy optimization with lagrangian
trpo_lagrangian;TRPO with lagrangian; $\text{CEM}$;Trust Policy optimization with lagrangian
cem_parallel;Cross Entropy Method (CEM); $\text{CEM}$;Cross entropy method
cem_n3000_h10_elites20;Cross Entropy Method (CEM); $\text{CEM}$;Cross entropy method
random_shooting_n3000_h12;RS(n=3000 h=12); $\text{RS}(n=3000, h=12)$;random shooting with n=3000 rollouts and horizon of h=12
rcem_n3000_h10_elites20;Robust Cross Entropy Method (RCEM); $\text{RCEM}$;Robust Cross entropy method
rcem_n3000_h30;Robust Cross Entropy Method (RCEM); $\text{RCEM}$;Robust Cross entropy method
cem_n3000_h30_elites20;Cross Entropy Method (CEM); $\text{CEM}$;Cross entropy method
random_shooting_safe_n3000_h30;Safe Random Shooting (S-RS); $\text{RS}(n=100, h=10)$;random shooting with n=100 rollouts and horizon of h=10
