## Simulator for Homogeneous RMAB
### Setup
1. `N` agents with `2` actions
2. A given index policy $\pi$
3. `d` states
4. `alpha` activations
#### Utility Function
1. `generate_Transition_Matrix` function output two `d x d` matrix
2. `env.step` next config
3. `env.m_star` compute by iteration
### Agent
1. Learn by `sarsa`. Use a resample trick for implementation convenienvce. 
2. Transition matrix of size `2d x 2d`

### Objective

1. Test $O(\sqrt N)$ in policy evaluation
2. Empirical Estimation of Markov Entanglement Measure
3. Test $O(1/\sqrt N)$ decay of relative decomposition error
4. The decay of Markov entanglement and decomposition error for given RMAB instance.

### Usage Example

```
python run.py --Evaluation=True
```

