0. prerequisites:

**Python3** is used to run this codes. The related packages are **Numpy** and **matplotlib**

If you don't have numpy or matplotlib, you can try to install it with:

`pip3 install numpy matplotlib`

1. generate environment:

For an example, to generate an environment with horizon of 3, 10 states, 100 actions and 5 dimension of the features. You can use the following command, where

`python ./environment_generate.py -H 3 -S 10 -A 100 -d 5 -name Your_Environment_Name`

2. run algorithms:

The `main.py` will test all 'StepMix', 'EpsMix' and 'LSVI-UCB' algorithms. For an example, when you want to have 10 trials of data of the '001' environment with $\alpha$ as 0.3, for 10000 times based on 30 offline trajectories of policy with parameter $k$, you can use the following command.

`python ./main.py -H 3 -S 10 -A 100 -d 5 -env 001 -k 20 -alpha 0.3 -lbeta 1.0 -ubeta 1.0 -N1 30 -N2 10000 -M 10`

3. draw figures:

To draw a picture to see what the algorithm has got, we can use `draw_rewards.py`. The following command will draw the pictures of 'total reward vs. epoch' and 'regret vs. epoch' in the average of 10 trials of the previous step 2 and step 3:

`python ./draw_rewards.py -H 3 -S 10 -A 100 -d 5 -env 001 -k 20 -alpha 0.3 -lbeta 1.0 -ubeta 1.0 -N1 30 -N2 10000 -M 10`