# Approximate information maximization for Bernoulli Bandits

## General information

This repository contains code associated to the paper titled 'Approximate Information Maximization for bandit game'.
It provides the version of Approximate Information Maximization (AIM) algorithm for Bernoulli bandit game with specifically for more than two arms.
It also provides an implementation of Thompson Sampling (TS) with a uniform prior to allow direct comparisons with AIM.

## Requirements

The code has been developed for Python 3.9.12 and requires the following packages:
- numpy

Additionally, we suggest to use the following packages if one wants to conduct additional experiments:
- matplotlib (if plotting is needed)
- multiprocessing (for parallelization to speed up the experiments)

## Installation

To install and use the code, unzip the folder 'AIM bernoulli bandits'. 


## Usage

To run the experiments, one can use the launcher 'single_launcher.py' or 'parallel_launcher.py' in the folder 'launcher'.
To display the results, one can use the script 'compare_regret.py' in the folder 'display'.
Possible methodname use to run the algorithms are 'aim' and thompson'.

## Code structure

The code is organized as follows:

<pre>
Data/
Figures/
src/
├── bandits_main_classes/
│   ├── algo_aim.py
│   ├── algo_bandit_general.py
│   ├── arms.py
│   ├── bandit_game.py
│   ├── initiate_bandit.py
│   ├── method_bandit.py
│   └── regret.py
├── display/
│   └── compare_regret.py
├── entropy/
│   └── analytic_entropy.py
├── launcher/
│   ├── parallel_launcher.py
│   └── single_launcher.py
└── tools/
    ├── initiate_algo.py
    ├── tools_entropy_object.py
    ├── tools_to_save.py
    └── tools.py
</pre>

Details: 
Data: should contains the data generated by the code (if any)

Figures : should contain the figures generated by the code (if any)

src : contains the Python code.

    bandits_main_classes : contains the main classes for the bandit game.
        algo_aim.py : contains the AIM algorithm and the increment entropy class.
        algo_bandit_general.py : contains the general class for the bandit game and Thompson Sampling.
        arms.py : contains the arm class.
        bandit_game.py : contains the class for the full bandit game which includes the bandit algorithm, the arms and the regret.
        initiate_bandit.py : contains the tools to initiate the bandit algorithm and the potential hyperparameters (if necessary).
        method_bandit.py : contains the method to initiate the potential hyperparameters (not used for AIM and Thompson Sampling yet) needed for each bandit algorithm.
        regret.py : contains the regret class.
        
    display : Used to compare the regret events by loading the regret saved in a json files.
            compare_regret.py : to compare the regret (obtained after running experiment) for different bandit algorithms (requires matplotlib package).
    
    entropy : contains the entropy evaluation functions used for AIM.
        analytic_entropy.py : contains the analytic functions needed to compute the entropy approximations used for the increment evaluation.

    launcher : contains the launcher to run the experiments.
        single_launcher.py : contains the launcher to run the experiments without parallelization and plot a unique run.
        parallel_launcher.py : contains the launcher to run the experiments with parallelization and save the average regret in a json file (requires Pool package).

    tools : contains the tools used for the experiments.
        initiate_algo.py : contains the tools to initiate the bandit algorithm and the potential hyperparameters (if necessary) using argparse.
        tools_entropy_object.py : contains the tools to compute the entropy increment along the different arms (better empirical one and worse empirical ones).
        tools_to_save.py : contains the tools to save the results of the experiments in a JSON file.
        tools.py : contains some additional tools for plots.


## Pseudo code

Here we remind the pseudo code of the AIM algorithm for Bernoulli bandits game with more than two arms.
For K > 2 Bernoulli arm:

1. Draw each arm once; observe reward r(t) and update statistics for each arm :
    theta(t) <- (r(t) + 1) / 3
    N(t) <- 4 for all t in {1,...,K}

2. For t = K+1 to T:

    a. Arm selection arm_max <- argmax_k mean(k)

    b. Evaluate Δ_arm_max S_app following the increment evaluation along the better empirical arm

    c. Evaluate min_a = argmax(Δ_i |S_app|, i ≠ arm_max) with Δ_i |S_app| following the increment evaluation along the worse empirical arms
    
    d. Compare arms and choose the arm to pull
    1. If Δ_arm_max S_app > Δ_min_a |S_app|, then pull arm_max and observe r(arm_max)
    2. Else, pull arm_min and observe r(arm_min)

    e. Update statistics of pulled arm k
    1. theta(k) <- (theta(k) * (N(k) - 1) + r) / N(k)
    2. N(k) <- N(k) + 1
