Running experiment_1.py runs our experiment on the CARMAB algorithm. FLAGS
include:

- window: This is the parameter Delta in the paper that controls the size of the
  congestion window.
- k: The number of arms.
- horizon: This is parameter T form the paper.
- delta: Confidence parameter from the paper.
- max_episodes: A parameter to control the maximum number of episodes. A value
  of 1000 is typically sufficient.
- print_action_error: When true, it prints out the error between the true base
  rewards and the learned rewards from the arms. Can be used to verify that
  learning was achieved properly.

Running experiment_2.pu runs our experiment on the CARCB algorithm. FLAGS
include:

- window: This is the parameter Delta in the paper that controls the size of the
  congestion window.
- k: The number of arms.
- horizon: This is parameter T form the paper.
- print_action_error: Parameter that prints out the error between the true base
  rewards and the learned rewards from the arms. Can be used to verify that
  learning was achieved properly.
- print_theta_error: When true, it prints out the error between the learner
  theta and the correct parameter theta*. Can be used to verify that learning
  was achieved properly.
- compute_opt: Prints the rewards of the optimal policy per round, in place of
  the algorithm's rewards.
