# Online Learning in the Repeated Mediated Newsvendor Problem

This repository contains the official implementation of the experiments from the paper *Online Learning in the Repeated Mediated Newsvendor Problem*. 

Our goal is to demonstrate the practical effectiveness of the algorithm proposed in our paper (Algorithm 1) across several representative environments of the repeated mediated newsvendor problem, and to highlight the necessity of the assumptions established in our paper (Assumptions 1.1-1.4).
- In `Traders.py`, we implement the supplier and retailer environments described in Section 6 of our paper. We include:
    - Eight combinations of suitable supplier cost distributions and retailer utility functions that satisfy Assumptions 1.1-1.4. 
    - Four additional supplier and retailer environments derived from the proof of Theorem 5.1 (see Appendix E), where each environment violates exactly one of the four assumptions. 
- In `Experiments.py`, we implement Algorithm 1 along with its regret calculation. We then run simulations across all environments defined in `Traders.py` and generate plots with the resulting regret curves.

## Requirements

To install requirements, run the following command:

```setup
pip install -r requirements.txt
```

## Simulation

Please note that our experiments do not involve model training and testing, but rather involve empirically evaluating the regret of Algorithm 1 under various conditions.

To run the simulations as configured in the paper, run this command:

```simulate
python Experiments.py
```

This executes Algorithm 1 with:
- Time horizon: $T=7\cdot 10^5$
- Number of trials: $30$
- Discretization parameter: $K = \left\lceil{T^{1/3}}\right\rceil$
- Confidence parameter: $\delta = 1/T$

To run the simulations with a custom time horizon and number of trials, run the following command:
```simulate
python Experiments.py --T <TIME_HORIZON> --n_trials <NUM_TRIALS>
```

**Note**: For larger time horizons (such as the default), it may take several minutes for each plot to be generated (we store each plot as a PNG file).

## Results

We achieve the following results:

### Case 1: Environments where assumptions are satisfied

| Retailer Utility Function | Supplier Cost Distribution  | Behaviour of Regret |
|---------------------------|---------------- | -------------- |
| Capped-Linear             |     Standard Uniform        |     $\widetilde{O} \left( T^{2/3} \right)$, below the upper bound        |
| Capped-Linear             |     Beta(2,5)       |      $\widetilde{O} \left( T^{2/3} \right)$, below the upper bound       |
| Capped-Linear             |     Truncated Log-Normal(mu = -0.5, sigma = 1)       |      $\widetilde{O} \left( T^{2/3} \right)$, below the upper bound       |
| Capped-Linear             |     0.75 Beta(2,5) + 0.25 Beta(5,2)     |      $\widetilde{O} \left( T^{2/3} \right)$, below the upper bound       |
| Exponential               |     Standard Uniform        |     $\widetilde{O} \left( T^{2/3} \right)$, below the upper bound        |
| Exponential               |     Beta(2,5)       |      $\widetilde{O} \left( T^{2/3} \right)$, below the upper bound       |
| Exponential               |     Truncated Log-Normal(mu = -0.5, sigma = 1)       |      $\widetilde{O} \left( T^{2/3} \right)$, below the upper bound       |
| Exponential               |     0.75 Beta(2,5) + 0.25 Beta(5,2)     |      $\widetilde{O} \left( T^{2/3} \right)$, below the upper bound       |

As expected, the regret of Algorithm 1 across all cost-utility pairs is $\widetilde{O} \left( T^{2/3} \right)$. Thus, the theoretical guarantee that we establish in Theorem 3.3 is satisfied across all simulations.

### Case 2: Environments where assumptions are not satisfied

| Lifted Assumption        | Behaviour of Regret |
| ------------------ | -------------- |
| Assumption 1.1   |      $\ge T/24$       |
| Assumption 1.2   |      $\ge T/24$       |
| Assumption 1.3   |      $\ge T/10$       |
| Assumption 1.4   |      $\ge T/5000$       |

In all cases where exactly one assumption is lifted and the other three hold, the regret of Algorithm 1 is linear. Thus, as established in Theorem 5.1, Assumptions 1.1–1.4 are necessary.