<h3> Codebase Structure

Appendix is in appendix.pdf

Models will be saved in ./model. We do not include pre-trained models here due to the file-size limit.
All source code are in ./src.

*** f3_1d.png: We realize that our original Figure 3 in the paper had two curves incorrect, 
uncertainty sampling and uniform random. Here we attach the updated plot in f3_1d.png. 
Note although the two curves have improved in terms of accuracy, 
they do not change our discussion and conclusion in the Experiment section. ***

<h3> Package Dependency and Setup PYTHONPATH

To run our code, install PyTorch 1.5 and OpenAI Gym.

To setup PYTHONPATH, run:

```export PYTHONPATH="${PYTHONPATH}:<path to>/robustlal"```

<h3> Training

<h4>Init + Regret Training:

```python minimax_policy.py game_mode model_id l_upper```

where game_mode is either 1d or q20; model_id is an integer; 
l_upper specifies the set to optimize over, {theta: C(X, Z, theta) < 2^l_upper}.

To replicate our experiment, for each game_mode, run with an arbitrary model_id and l_upper = 4.

<h4>Finetune Training

```python finetune.py game_mode model_id l_upper load_name```

where load_name is the saved model dict to finetune on, e.g. "../model/minimax_policy_1d_1_4.dict" 
or "../model/finetune_binary_1d_1_4.dict". 

To replicate our experiment, for each game_mode (1d as an example below), run with
```
python finetune.py 1d 1 4 ../model/minimax_policy_1d_1_4.dict
then
python finetune.py 1d 1 3.5 ../model/finetune_binary_1d_1_4.dict
python finetune.py 1d 1 4.5 ../model/finetune_binary_1d_1_4.dict
python finetune.py 1d 1 5 ../model/finetune_binary_1d_1_4.dict
```

<h4> Training overall policy (i.e., pi^*, the instance dependent policy)

```python train_overall.py game_mode model_id l_upper model1 model2 ...```

where each of model1, model2, ... is a saved model dict.

To replicate our experiment, for each game_mode (1d as an example below), run with
```
python train_overall.py 1d 1 5 ../model/minimax_policy_1d_1_3_5.dict ../model/finetune_binary_1d_1_4.dict ../model/finetune_binary_1d_1_4_5.dict ../model/finetune_binary_1d_1_5.dict
```

<h3> Evaluation

<h4> Policy

```python test_minimax_policy.py game_mode load_name```

where load_name is the name of the model dict.

<h5> Here are the expected results:

For 1d (Figure 1 & "ours" in Figure 2):

|Policy|l_upper=3|l_upper=3.5|l_upper=4|l_upper=4.5|l_upper=5|
|---|---|---|---|---|---|
|../model/finetune_overall_1d_1_5.dict |86.17|78.38|70.03|56.50|47.22|
|../model/finetune_binary_1d_1_3_5.dict|87.80|79.57|67.30|56.18|42.65|
|../model/finetune_binary_1d_1_4.dict  |88.53|77.08|71.75|51.85|41.79|
|../model/finetune_binary_1d_1_4_5.dict|84.77|75.67|66.33|59.23|44.86|
|../model/finetune_binary_1d_1_5.dict  |85.40|72.99|65.26|61.89|47.38|

Figure 3:

|Policy|h=.5|h=.6|h=.7|h=.8|h=.9|h=1|
|---|---|---|---|---|---|---|
|../model/finetune_overall_1d_1_5.dict |61.85|67.81|73.08|82.73|90.54|100.0|

For q20 ("ours" in Figure 4):

|Policy|l_upper=3.5|l_upper=4|l_upper=4.5|
|---|---|---|---|
|../model/finetune_overall_q20_1_5.dict|35.39|26.18|13.38|

Figure 5:

|Policy|h=.5|h=.6|h=.7|h=.8|h=.9|h=1|
|---|---|---|---|---|---|---|
|../model/finetune_overall_q20_1_5.dict |6.64|5.2|7.8|20.91|54.63|96.74|

<h4> Uniform Random

```python uniform_random.py game_mode```

<h5> Here are the expected results:

For 1d (Figure 2):

|Policy|l_upper=3|l_upper=3.5|l_upper=4|l_upper=4.5|l_upper=5|
|---|---|---|---|---|---|
|Uniform|38.14|35.03|26.96| 22.82| 16.7|

Figure 3:

|Policy|h=.5|h=.6|h=.7|h=.8|h=.9|h=1|
|---|---|---|---|---|---|---|
|Uniform |32.58|31.46|35.04|42.81|49.42|55.65|

For q20 (Figure 4):

|Policy|l_upper=3.5|l_upper=4|l_upper=4.5|
|---|---|---|---|
|Uniform|9.92| 6.88| 4.94|

Figure 5:

|Policy|h=.5|h=.6|h=.7|h=.8|h=.9|h=1|
|---|---|---|---|---|---|---|
|Uniform |4.15| 2.52| 3.45| 7.71| 18.83| 43.9|

<h4> Uncertainty Sampling

```python uncertainty_sampling.py game_mode```

<h5> Here are the expected results:

For 1d (Figure 2):

|Policy|l_upper=3|l_upper=3.5|l_upper=4|l_upper=4.5|l_upper=5|
|---|---|---|---|---|---|
|Uncertainty|68.88| 59.76| 52.67| 43.81| 29.27|

Figure 3:

|Policy|h=.5|h=.6|h=.7|h=.8|h=.9|h=1|
|---|---|---|---|---|---|---|
|Uncertainty|45.5|47.35|56.04|71.38|87.85| 100.0|

For q20 (Figure 4):

|Policy|l_upper=3.5|l_upper=4|l_upper=4.5|
|---|---|---|---|
|Uncertainty|36.3| 24.65| 14.61|

Figure 5:

|Policy|h=.5|h=.6|h=.7|h=.8|h=.9|h=1|
|---|---|---|---|---|---|---|
|Uncertainty|4.54|3.67|5.29|11.93|35.85|81.95|

<h4> Soft-Decision General Binary Search

```python binary_search.py game_mode```

<h5> Here are the expected results:

For 1d (Figure 2):

|Policy|l_upper=3|l_upper=3.5|l_upper=4|l_upper=4.5|l_upper=5|
|---|---|---|---|---|---|
|SGBS|1.76| 1.81| 1.07| 1.1| 1.42|

Figure 3:

|Policy|h=.5|h=.6|h=.7|h=.8|h=.9|h=1|
|---|---|---|---|---|---|---|
|SGBS, beta=.01|35.77|40.81|50.04|62.15|79.5|100.0|
|SGBS, beta=.03|34.73|38.04|46.15|58.96|75.77|100.0|
|SGBS, beta=.1|37.62|39.65|50.81|62.65|79.42|100.0|
|SGBS, beta=.2|32.38|38.35|45.23|57.58|75.54|100.0|
|SGBS, beta=.3|34.19|41.92|49.85|60.35|80.46|100.0|
|SGBS, beta=.4|29.35|37.77|45.08|53.85|71.65|100.0|

For q20 (Figure 4):

|Policy|l_upper=3.5|l_upper=4|l_upper=4.5|
|---|---|---|---|
|SGBS|0| 0| 0|

Figure 5:

|Policy|h=.5|h=.6|h=.7|h=.8|h=.9|h=1|
|---|---|---|---|---|---|---|
|SGBS, beta=.01|5.33|6.87|10.5|28.54|70.42|99.89|
|SGBS, beta=.03|5.65|7.27|11.31|28.68|70.68|99.89|
|SGBS, beta=.1|5.69|7.0|11.24|29.42|71.63|99.86|
|SGBS, beta=.2|5.16|6.8|10.91|27.79|69.14|99.84|
|SGBS, beta=.3|4.08|5.24|8.13|21.0|57.83|99.93|
|SGBS, beta=.4|2.41|2.97|4.45|12.19|34.79|84.7|

<h4> Static Oracles

```python static_allocs.py game_mode oracle```

where oracle is either rho (for rho^* oracle) or approx (for tilde{rho} oracle).

<h5> Here are the expected results:

For 1d (Figure 2):

|Policy|l_upper=3|l_upper=3.5|l_upper=4|l_upper=4.5|l_upper=5|
|---|---|---|---|---|---|
|rho_star|77.73|68|49.93|39.34|31.67|
|rho_tilde|71.65|57.97|43.94|33.78|26.20|

Figure 3:

|Policy|h=.5|h=.6|h=.7|h=.8|h=.9|h=1|
|---|---|---|---|---|---|---|
|rho_star|29.62|32.23|40.31|58.62|77.88|99.42|
|rho_tilde|28.08|32.31|38.46|56.08|75.0|97.58|

For q20 (Figure 4):

|Policy|l_upper=3.5|l_upper=4|l_upper=4.5|
|---|---|---|---|
|rho_star|23.2|19.48|14.19|
|rho_tilde|17.36|12.74|9.92|

Figure 5:

|Policy|h=.5|h=.6|h=.7|h=.8|h=.9|h=1|
|---|---|---|---|---|---|---|
|rho_star|2.24|2.02|2.95|10.08|33.25|79.02|
|rho_tilde|2.84|2.01|3.08|9.43|27.04|62.91|
