# Approximate information maximization for Bernoulli Bandits

## General information

This repository contains code associated to the paper titled 'Approximate Information Maximization for bandit game'.
It provides the version of Approximate Information Maximization (AIM) algorithm for Gaussian bandit game with specifically for more than two arms.
It also provides an implementation of Thompson Sampling (TS), Thompson Sampling Plus MED, KLUCB, and KLUC++ with uniform priors to allow direct comparisons with AIM.

## Requirements

The code has been developed for Python 3.9.12 and requires the following packages:
- numpy

Additionally, we suggest to use the following packages if one wants to conduct additional experiments:
- matplotlib (if plotting is needed)
- multiprocessing (for parallelization to speed up the experiments)

## Installation

To install and use the code, unzip the folder 'AIM gaussian bandits'. 


## Usage

To run the experiments, one can use the launcher 'single_launcher.py' or 'parallel_launcher.py' in the folder 'launcher'.
To display the results, one can use the script 'compare_regret.py' in the folder 'display'.
Possible methodname use to run the algorithms are 'aim' and thompson', 'med', 'thompson_plus', 'klucb_plus_plus' and 'klucb'.

## Code structure

The code is organized as follows:

<pre>
Data/
Figures/
src/
├── bandits_main_classes/
│   ├── algo_aim.py
│   ├── algo_bandit_general.py
│   ├── arms.py
│   ├── bandit_game.py
│   ├── initiate_bandit.py
│   ├── method_bandit.py
│   └── regret.py
├── display/
│   └── compare_regret.py
├── launcher/
│   ├── parallel_launcher.py
│   └── single_launcher.py
└── tools/
    ├── initiate_algo.py
    ├── tools_to_save.py
    └── tools.py
</pre>

Details: 
Data: should contains the data generated by the code (if any)

Figures : should contain the figures generated by the code (if any)

src : contains the Python code.

    bandits_main_classes : contains the main classes for the bandit game.
        algo_aim.py : contains the AIM algorithm and the increment entropy class.
        algo_bandit_general.py : contains the general class for the bandit game and Thompson Sampling.
        arms.py : contains the arm class.
        bandit_game.py : contains the class for the full bandit game which includes the bandit algorithm, the arms and the regret.
        initiate_bandit.py : contains the tools to initiate the bandit algorithm and the potential hyperparameters (if necessary).
        method_bandit.py : contains the method to initiate the potential hyperparameters (not used for AIM and Thompson Sampling yet) needed for each bandit algorithm.
        regret.py : contains the regret class.
        
    display : Used to compare the regret events by loading the regret saved in a json files.
            compare_regret.py : to compare the regret (obtained after running experiment) for different bandit algorithms (requires matplotlib package).

    launcher : contains the launcher to run the experiments.
        single_launcher.py : contains the launcher to run the experiments without parallelization and plot a unique run.
        parallel_launcher.py : contains the launcher to run the experiments with parallelization and save the average regret in a json file (requires Pool package).

    tools : contains the tools used for the experiments.
        initiate_algo.py : contains the tools to initiate the bandit algorithm and the potential hyperparameters (if necessary) using argparse.
        tools_to_save.py : contains the tools to save the results of the experiments in a JSON file.
        tools.py : contains some additional tools for plots.


## Pseudo code

Here we remind the pseudo code of the AIM algorithm for Gaussian bandits game with more than two arms.
For K > 2 Bernoulli arm:

1. Draw each arm once; observe reward r(t) and update statistics for each arm :
    theta(t) <- r(t) 
    N(t) <- 1 for all t in {1,...,K}

2. For t = K+1 to T:

    a. Arm selection arm_max <- argmax_k mean(k)

    b. Evaluate m = argmax(Δ_armmax,k S_app, k ≠ arm_max) following the increment evaluation along each worse empirical arm compared to the best empirical arm
    
    c. Compare arms and choose the arm to pull
    1. If Δ_armmax,m S_app < 0  then pull arm_max and observe r(arm_max)
    2. Else, pull arm_m and observe r(arm_m)

    e. Update statistics of pulled arm: k
    1. theta(k) <- (theta(k) *(N(k) ) + r) / (N(k) + 1)
    2. N(k) <- N(k) + 1
