# Gridworld-Rethinking-Deep-Policy-Gradients-via-State-Wise-Policy-Improvement
Training command: (Default AE version)
python {TRAINING_FILE} --seed {SEEDS} --classifier {CLASSIFIER} --value-method Qvalue --max-episode 40000 --learning-rate 1e-4 --actor-delay 1 --margin 0.1 --log-interval 10 --envi 4x4

Training command arguments:
TRAINING_FILE = tabular_policy.py, onehot_test_gridworld.py
SEEDS = {123, 456, 222, 789, 234, 246, 666, 369, 861, 829}
CLASSIFIER = {HPO-AM, HPO-AM-log, HPO-AM-root, HPO-AM-sub, HPO-AM-square}

###For tabular policy with true value table
Training command: (Default AE version)
python tabular_policy.py --vTable True --seed {SEEDS} --classifier {CLASSIFIER} --value-method Qvalue --max-episode 40000 --learning-rate 1e-4 --actor-delay 1 --margin 0.1 --log-interval 1 --envi 4x4

Training command arguments:
SEEDS = {123, 456, 222, 789, 234, 246, 666, 369, 861, 829}
CLASSIFIER = {HPO-AM, HPO-AM-log, HPO-AM-root, HPO-AM-sub, HPO-AM-square}

For WAE scenario
Run 4x4 environment:
python tabular_policy.py --vTable True --classifier {CLASSIFIER} --value-method Qvalue --envi 4x4 --weight --max-episode 40000 --learning-rate 1e-4 --actor-delay 1 --margin 0.1 --log-interval 1 

For AE scenario
Run 4x4 environment:
python tabular_policy.py --vTable True --classifier {CLASSIFIER} --value-method Qvalue --envi 4x4 --max-episode 40000 --learning-rate 1e-4 --actor-delay 1 --margin 0.1 --log-interval 1 

For WCE scenario
Run 4x4 environment:
python tabular_policy.py --vTable True --classifier {CLASSIFIER} --value-method Qvalue --envi 4x4 --weight --fixedEps fixedEps --max-episode 40000 --learning-rate 1e-4 --actor-delay 1 --margin 0.1 --log-interval 1 

For CE scenario
Run 4x4 environment:
python tabular_policy.py --vTable True --classifier {CLASSIFIER} --value-method Qvalue --envi 4x4 --fixedEps fixedEps --max-episode 40000 --learning-rate 1e-4 --actor-delay 1 --margin 0.1 --log-interval 1 

###For neural network policy
Training command: (Default AE version)
python onehot_test_gridworld.py --seed {SEEDS} --classifier {CLASSIFIER} --value-method Qvalue --max-episode 40000 --learning-rate 1e-4 --actor-delay 1 --margin 0.1 --log-interval 10 --envi 4x4

Training command arguments:
SEEDS = {123, 456, 222, 789, 234, 246, 666, 369, 861, 829}
CLASSIFIER = {HPO-AM, HPO-AM-log, HPO-AM-root, HPO-AM-sub, HPO-AM-square}

For WAE scenario
Run 4x4 environment:
python onehot_test_gridworld.py --classifier {CLASSIFIER} --value-method Qvalue --envi 4x4 --weight --max-episode 100000 --learning-rate 1e-4 --actor-delay 1 --margin 0.1 --log-interval 1 

For AE scenario
Run 4x4 environment:
python onehot_test_gridworld.py --classifier {CLASSIFIER} --value-method Qvalue --envi 4x4 --max-episode 100000 --learning-rate 1e-4 --actor-delay 1 --margin 0.1 --log-interval 1 

For WCE scenario
Run 4x4 environment:
python onehot_test_gridworld.py --classifier {CLASSIFIER} --value-method Qvalue --envi 4x4 --weight --fixedEps fixedEps --max-episode 100000 --learning-rate 1e-4 --actor-delay 1 --margin 0.1 --log-interval 1 

For CE scenario
Run 4x4 environment:
python onehot_test_gridworld.py --classifier {CLASSIFIER} --value-method Qvalue --envi 4x4 --fixedEps fixedEps --max-episode 100000 --learning-rate 1e-4 --actor-delay 1 --margin 0.1 --log-interval 1 
