# Partial Information as Full: Reward Imputation with Sketching in Bandits

This repository is the official implementation of Partial Information as Full: Batched Bandits using Efficient Reward Imputation. 

## Requirements

To install requirements:

```setup
pip install -r requirements.txt
```


## Training

To train models except DFM-S in the paper, run this command:

```train
python algo_main.py  Algorithm_name
```
To train DFM-S in the paper, run this command:

```train
python algo_main2.py
```
We recommend you tuning hyper-parameters by using [nni module](https://github.com/microsoft/nni). In our experiments, we use nni to tune hyper-parameters.


## Evaluation

Because we calculate average reward in each episode, you can export reward data using nni after running code.
To export reward data, run:

```eval
nnictl experiment export [experiment_id] --filename [file_path] --type json --intermediate
```

## Results

Our model achieves the following performance on a commercial dataset :


$$
\begin{array}{lllr}\hline \text { Algorithm } & \text { CVR (mean } \pm \text { std }) & \text { CTCVR (mean } \pm \text { std) } & \text { Time (sec., mean } \pm \text { std }) \\ \hline \text { DFM-S } & 0.8656 \pm 0.0473 & 0.3317 \pm 0.0218 & 302.3140 \pm 8.3045 \\ \text { SBUCB } & 0.8569 \pm 0.0037 & 0.4277 \pm 0.0084 & 43.5435 \pm 0.3659 \\ \text { BEXP3 } & 0.4846 \pm 0.0205 & 0.2425 \pm 0.0116 & 53.5001 \pm 0.9220 \\ \text { BEXP3-IPW } & 0.4862 \pm 0.0187 & 0.2436 \pm 0.0113 & 56.0101 \pm 1.4142 \\ \text { BLTS-B } & 0.8663 \pm 0.0178 & 0.4285 \pm 0.0157 & 218.2109 \pm 1.8198 \\ \hline \text { PUIR } & 0.8807 \pm 0.0053 & 0.4411 \pm 0.0029 & 184.3575 \pm 2.2346 \\ \text { SPUIR } & 0.8770 \pm 0.0059 & 0.4397 \pm 0.0032 & 81.5753 \pm 1.5879 \\ \text { PUIR-RS } & 0.8763 \pm 0.0056 & 0.4389 \pm 0.0030 & 180.4999 \pm 1.7763 \\ \text { SPUIR-RS } & 0.8758 \pm 0.0058 & 0.4391 \pm 0.0031 & 80.8003 \pm 2.9030 \\ \hline\end{array}
$$



