# SP-UCB-OLP: Saddle-Point UCB for Online Linear Programming

This repository contains the official implementation for the paper:

> **Online Configuration Selection with Switching Costs and Admission Control**
> ICML 2025

## Overview

SP-UCB-OLP solves online linear programs (OLPs) where the decision-maker must simultaneously:

1. **Select a configuration** (e.g., ML serving mode) from K options with unknown reward/cost distributions
2. **Decide admission** of arriving requests under budget constraints

The algorithm maintains a mixture distribution over configurations and a global dual price vector, updated via an optimistic saddle-point formulation:

$$\min_{\mathbf{p} \geq 0} \left\{ \langle \mathbf{p}, \mathbf{b}_{\mathrm{safe}} \rangle + \max_\theta \left[ \hat{g}_\theta(\mathbf{p}) + \beta_\theta(t) \right] \right\}$$

where $\hat{g}_\theta(\mathbf{p})$ is the empirical surplus function and $\beta_\theta(t)$ is a UCB confidence bonus. The algorithm achieves $\tilde{O}(\sqrt{T})$ regret relative to the switching-aware fluid oracle $V^{\mathrm{mix}}$.

## Installation

```bash
# Clone the repository
git clone https://github.com/<your-username>/sp-ucb-olp.git
cd sp-ucb-olp

# Create virtual environment (Python 3.10+ required)
python -m venv venv
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Or install as editable package
pip install -e .
```

### Optional: Gurobi (faster LP solver)

The saddle-point solver defaults to scipy L-BFGS-B. For faster performance, install [Gurobi](https://www.gurobi.com/):

```bash
pip install gurobipy
```

## Quick Start

```python
from sp_ucb_olp.data import get_loader
from sp_ucb_olp.algorithms import get_algorithm
from sp_ucb_olp.runner import run_single_experiment

# Load a synthetic scenario (S1: High-Variance Exploration, K=4, d=3)
loader = get_loader('S1', T=1000, seed=42)
B = loader.get_budget(rho=0.7)

# Create algorithm
alg = get_algorithm('SP-UCB-OLP', K=loader.K, d=loader.d, T=1000, B=B,
                     config={'alpha': 0.1})

# Run experiment
trajectory, stats = run_single_experiment(loader, alg, B=B, seed=42)
print(f"Total reward: {trajectory.total_reward:.2f}")
print(f"Acceptance rate: {stats['acceptance_rate']:.3f}")
```

## Reproducing Paper Results

Each experiment maps to a specific script. All scripts save results to `results/`.

### Table 1: Regret Scaling (Section 6.1)

```bash
python experiments/run_alpha15_regret.py
python experiments/plot_alpha15_regret.py
```

Runs SP-UCB-OLP with theory-compliant alpha=1.5 on scenario S0 (K=5, d=3 Gaussian) across T in {100, ..., 2000} with 50 seeds.

### Figure 3: Alibaba Cluster Traces (Section 6.2)

```bash
python experiments/run_alibaba_experiments.py
python experiments/visualize_alibaba_boxplots.py
```

Runs three Alibaba scenario profiles (quant_8bit, quant_4bit, batching) with rho sweep and 10 seeds per condition.

### Table 2 & Figure 2: Synthetic Experiments (Section 6.3)

```bash
python experiments/run_all_experiments.py
python experiments/generate_paper_figures.py
```

This master script runs:
- **E1**: Benchmark validation on S4 (complementarity gap, 250 runs)
- **E2**: Component ablations across S1-S4 (240 runs)
- **E3**: K-sweep in {2, 4, 8, 16} (200 runs)
- **E4**: Competitive ratio sweep on S1-S3 (750 runs)

### Additional Experiments (Appendix)

```bash
# Alpha sensitivity sweep
python experiments/run_alpha_sweep.py

# K-sweep analysis
python experiments/run_k_sweep.py

# Extended regret analysis
python experiments/run_regret_analysis.py
```

## Paper-to-Code Mapping

| Paper Section | Experiment | Script | Data Loader |
|---|---|---|---|
| Section 6.1: Regret Scaling (S0) | T in {100,...,2000}, alpha=1.5, 50 seeds | `run_alpha15_regret.py` | `s5_gaussian.py` |
| Section 6.2: Alibaba Traces | 3 scenarios x rho sweep x 10 seeds | `run_alibaba_experiments.py` | `alibaba.py` |
| Section 6.3: Benchmark (S4) | Complementarity gap, 250 runs | `run_all_experiments.py` (E1) | `s4_complementarity.py` |
| Section 6.3: CR Sweep (S1-S3) | rho sweep x 10 seeds x 3 scenarios | `run_all_experiments.py` (E4) | `s1_*.py`, `s2_*.py`, `s3_*.py` |
| Section 6.3: Ablations | 5 variants x S1-S4 | `run_all_experiments.py` (E2) | All synthetic |
| Appendix: K-sweep | K in {2,4,8,16} | `run_k_sweep.py` | `s5_gaussian.py` |
| Appendix: Alpha sweep | alpha in {0.01,...,1.0} | `run_alpha_sweep.py` | Various |

## Repository Structure

```
sp-ucb-olp/
├── README.md                          Documentation
├── LICENSE                            MIT License
├── requirements.txt                   Python dependencies
├── pyproject.toml                     Package configuration
│
├── sp_ucb_olp/                        Core package
│   ├── __init__.py                    Package exports
│   ├── oracle.py                      V^mix and V* computation (scipy L-BFGS-B)
│   ├── runner.py                      Experiment orchestration
│   ├── storage.py                     Data structures and serialization
│   ├── visualization.py               Publication-quality plotting
│   │
│   ├── algorithms/
│   │   ├── __init__.py                Algorithm factory: get_algorithm()
│   │   ├── base.py                    BaseAlgorithm (abstract class)
│   │   ├── sp_ucb_olp.py             SP-UCB-OLP (main algorithm)
│   │   └── baselines.py              Greedy, OneHot, Oracle, Random, ablations
│   │
│   └── data/
│       ├── __init__.py                Data loader factory: get_loader()
│       ├── base_loader.py             BaseDataLoader (abstract class)
│       ├── s1_complementarity.py      S1: High-Variance Exploration (K=4, d=3)
│       ├── s2_noisy.py                S2: Deceptive Arms (K=4, d=3)
│       ├── s3_dominant.py             S3: Selective Admission (K=2, d=3)
│       ├── s4_complementarity.py      S4: Complementarity (K=2, d=2)
│       ├── s5_gaussian.py             S0: Gaussian / Regret Validation (K=5, d=3)
│       └── alibaba.py                 Alibaba Cluster Trace (K=3, d=2)
│
├── experiments/                       Experiment scripts
│   ├── run_all_experiments.py         Master runner (E1-E4 synthetic)
│   ├── run_alibaba_experiments.py     Alibaba trace experiments
│   ├── run_alpha15_regret.py          Regret scaling with alpha=1.5
│   ├── run_regret_analysis.py         Extended regret analysis
│   ├── run_k_sweep.py                 K-sweep experiments
│   ├── run_alpha_sweep.py             Alpha sensitivity
│   ├── run_parallel_synthetic.py      Parallel synthetic runner
│   ├── run_paper_experiments.py       Paper experiments
│   ├── generate_paper_figures.py      Figure generation
│   ├── generate_new_figures.py        Extended figures
│   ├── visualize_alibaba_boxplots.py  Alibaba boxplots
│   ├── plot_alpha15_regret.py         Regret scaling plots
│   └── regenerate_fig5.py             Regenerate Figure 5
│
└── results/                           Output directory (created by scripts)
```

## Algorithms

| Algorithm | Description | Key Parameters |
|---|---|---|
| **SP-UCB-OLP** | Saddle-point UCB with mixture sampling and global dual prices | `alpha` (exploration), `epsilon` (slack) |
| **Greedy** | SP-UCB-OLP with alpha=0 (no exploration bonus) | - |
| **OneHot** | Per-configuration UCB (selects single best arm, no mixture) | `alpha` |
| **Oracle** | True optimal mixture and prices from population distributions | - |
| **Random** | Uniform configuration selection, accepts all feasible arrivals | - |

### Ablation Variants

| Variant | What it removes | Expected failure mode |
|---|---|---|
| **EnvelopeGreedy** | Mixture sampling (uses argmax instead) | No complementarity exploitation |
| **MixtureLocalPrice** | Global prices (uses per-config prices) | Fails on S4 (orthogonal resources) |
| **AcceptedOnly** | Unbiased sampling (only updates from accepted arrivals) | Selection bias on S3 |
| **NoSlack** | Budget slack (epsilon=0) | Budget overruns |

## Synthetic Scenarios

| Scenario | K | d | Design Purpose |
|---|---|---|---|
| **S0** | 5 | 3 | Regret scaling validation (theory-compliant) |
| **S1** | 4 | 3 | Low-variance optimal among high-variance alternatives |
| **S2** | 4 | 3 | High reward does not imply high surplus |
| **S3** | 2 | 3 | Price learning under arrival variability |
| **S4** | 2 | 2 | Complementarity gap (V^mix / V* approx 2) |

## Alibaba Trace Experiments

Three scenario profiles derived from Alibaba Cluster Trace 2018:

- **quant_8bit**: Moderate budgets, emphasis on 8-bit inference quality
- **quant_4bit**: Tight memory budget, favouring aggressive quantization
- **batching**: Looser budgets, highlighting batching policies

Each profile defines configuration-specific multipliers for reward, CPU, and memory consumption (see `sp_ucb_olp/data/alibaba.py` for exact values).

## Dependencies

**Required:**
- Python >= 3.10
- numpy >= 1.20.0
- scipy >= 1.7.0
- matplotlib >= 3.4.0

**Optional:**
- gurobipy >= 10.0 (faster LP solver)

## Citation

```bibtex
@inproceedings{sp-ucb-olp,
  title={Online Configuration Selection with Switching Costs and Admission Control},
  author={Anonymous},
  booktitle={Proceedings of the International Conference on Machine Learning (ICML)},
  year={2025}
}
```

## License

This project is licensed under the MIT License. See [LICENSE](LICENSE) for details.
