0. prerequisites:

**Python3** is used to run this codes. The related packages are **Numpy** and **matplotlib**

If you don't have numpy or matplotlib, you can try to install it with:

`pip3 install numpy matplotlib`

1. generate environment:

For an example, to generate an environment with horizon of 3, 10 states, 100 actions and 5 dimension of the features. You can use the following command, where

`python ./environment_generate.py -H 3 -S 10 -A 100 -d 5 -name Your_Environment_Name`

2. run algorithms:

The `main.py` will test all 'StepMix', 'StepNoMix' and 'EpsMix' algorithms. For an example, when you want to have 10 trials of data of the '001' environment with $\alpha$ as 0.3, $k$ as 20, for 10000 times, you can use the following command.

`python ./main.py -H 3 -S 10 -A 100 -d 5 -env 001 -k 20 -alpha 0.3 -beta 1.0 -N 10000 -M 10`

3. run LSVI-UCB:

The `LSVI_UCB_main.py` will test the [LSVI-UCB method]<http://proceedings.mlr.press/v125/jin20a.html>. To test it in the '001' environment with 10000 epoch for 10 trials, you can use the following command.

`python ./LSVI_UCB_main.py -H 3 -S 10 -A 100 -d 5 -env 001 -N 10000 -M 10`

4. draw figures:

To draw a picture to see what the algorithm has got, we can use `draw.py`. The following command will draw the pictures of 'total reward vs. epoch' and 'regret vs. epoch' in the average of 10 trials of the previous step 2 and step 3:

`python ./draw.py -H 3 -S 10 -A 100 -d 5 -env 001 -k 20 -alpha 0.3 -beta 1.0 -N 10000 -M 10`