# Code for Valid Selection among Conformal Sets (NeurIPS 2026)

This folder contains the code to reproduce the experiments presented in the paper "Valid Selection among Conformal Sets". The experiments are divided into two main settings: batch and online.

## Directory Structure

The code is organized into the following main directories:

* `batch_setting/`: Contains scripts and utilities for running experiments in the batch conformal prediction setting.
    * `results/`: Stores pre-tuned hyperparameters for UCI dataset experiments.
    * `synthetic_results/` (will be created): Default output directory for synthetic experiments.
    * `uci_results/` (will be created): Default output directory for UCI regression experiments.
* `online_setting/`: Contains scripts and utilities for running experiments in the online conformal prediction setting.
    * `data/`: Contains data for online experiments (e.g., `electricity-normalized.csv`).
    * `experiment_results/` (will be created): Default base directory for online experiment outputs.
        * `detailed_logs/` (will be created): For detailed logs from online experiments.
        * `summary_logs/` (will be created): For summary logs from online experiments.
        * `visualizations/` (will be created): For plots generated from online experiment logs.
* `examples/`: Contains Jupyter notebooks and utility scripts for generating combined plots and illustrating concepts.


## Running Experiments

### Batch Setting Experiments

Navigate to the `batch_setting/` directory.

**1. Synthetic Regression Experiments:**

* **Script:** `synthetic_exp.py`
* **Purpose:** Runs experiments on synthetic sine wave data with varying numbers of models (M) and calibration set sizes (N\_cal). It evaluates different conformal selection/aggregation methods.
* **Key Command-Line Arguments:**
    * `--n_tr`: Training set size (default: 2000).
    * `--N_cal`: Total calibration set size per repetition (default: 400).
    * `--N_rep`: Number of repetitions/test points (default: 500).
    * `--alpha`: Target coverage level (1 - miscoverage rate, default: 0.1).
    * `--num_partition`: Number of partitions for heterogeneous model training (-1 for homogeneous, default: 5).
    * `--M_values`: Comma-separated list of M values (number of base models) to test (e.g., '10,20,50'). Uses a default range if not provided.
    * `--num_seeds`: Number of random seeds to run (default: 40).
* **Example Usage:**
    ```bash
    python synthetic_exp.py --N_cal 300 --num_seeds 10 --M_values "10,30,50" --num_partition 5
    python synthetic_exp.py --N_cal 400 --num_seeds 10 --M_values "20,40" --num_partition -1 
    ```
* **Output:** CSV files containing coverage and length metrics will be saved in the `synthetic_results/` directory (created if it doesn't exist). File names include experiment parameters and a timestamp.

**2. UCI Regression Experiments:**

* **Script:** `uci_regression.py`
* **Purpose:** Runs experiments on various regression datasets, comparing individual model performance and aggregator performance.
* **Base Model Hyperparameter Tuning (Optional but Recommended):**
    * Before running `uci_regression.py`, you can tune the hyperparameters of the base regression models using `hyper_tune.py` .
    * This script uses `scikit-optimize` for Bayesian optimization and saves the best hyperparameters to `batch_setting/results/tuned_hyperparams_skopt_y_scaled.json`. This file is already provided for convenience.
    * `uci_regression.py` will automatically load these tuned parameters if the JSON file exists.
    * **Example for running `hyper_tune.py`:**
        ```bash
        python hyper_tune.py --datasets ABALONE BIKE_SHARING --n_iter 25
        ```
* **Key Command-Line Arguments for `uci_regression.py`:**
    * `--datasets`: List of UCI datasets to use (e.g., `ABALONE` `BIKE_SHARING` `CALIFORNIA_HOUSING`, default uses these three).
    * `--num_seeds`: Number of random seeds (default: 10).
    * `--alpha`: Target miscoverage rate (default: 0.1).
    * `--train_ratio`, `--cal_ratio`, `--test_ratio`: Proportions for data splitting (must sum to 1, defaults: 0.8, 0.1, 0.1).
    * `--num_partitions`: Number of partitions for training models (-1 for homogeneous, default: -1).
    * `--output_dir`: Directory to save results (default: `uci_results`).
* **Example Usage:**
    ```bash
    python uci_regression.py --datasets ABALONE CALIFORNIA_HOUSING --num_seeds 5 --num_partitions 5
    python uci_regression.py --datasets BIKE_SHARING --train_ratio 0.7 --cal_ratio 0.15 --test_ratio 0.15
    ```
* **Output:** Two CSV files (one for individual model metrics, one for aggregator metrics) will be saved in the specified `--output_dir` (default: `uci_results/`). File names include dataset names, parameters, and a timestamp.


### Online Setting Experiments

Navigate to the `online_setting/` directory.


**Running All Online Experiments (Shell Script):**

* **Script:** `run_all_experiments.sh`
* **Purpose:** A shell script that automates running `run_experiment.py` for multiple datasets and seeds, then generates visualizations and aggregates summary statistics.
* **Configuration within the script:**
    * `DATASETS`: Array of datasets to run (e.g., `"elec" "aram"`).
    * `NUM_SEEDS`: Total number of seeds to run per dataset.
    * `START_SEED`: Initial seed value.
    * Output directories are also defined within the script.
* **Usage:**
    ```bash
    bash run_all_experiments.sh
    ```
* **Output:**
    * Detailed logs, summary logs, and visualizations will be populated in the respective subdirectories of `experiment_results/`.
    * An aggregated summary statistics CSV (`aggregated_summary_stats.csv`) will be created in `experiment_results/summary_logs/`.

**Helper scripts in `online_setting/`:**
* `data_utils.py`: Functions for loading and preprocessing online datasets (e.g., `electricity-normalized.csv`).
* `model_utils.py`: Generates outputs from various online base learning models (e.g., SGD, RollingLM).
* `coma_utils.py`: Implements COMA utilities, including AdaHedge and Hedge algorithms.
* `selection.py`: Contains logic for running COMA and AdaCOMA aggregation methods.
* `assignments.py`: Functions to create assignment matrices for how forecasters pick base learners (e.g., `create_assignment_matrix_random_switch`).
* `utils.py`: Utilities for logging results from online experiments.
* `visualize_detailed_logs.py`: Generates plots from the detailed log files.
* `aggregate_summary_logs.py`: Aggregates results from multiple summary log CSVs.

### Generating Examples plot

Navigate to the `examples/` directory.

* **Notebook:** `combined_plots.ipynb`
* **Purpose:** Generates a combined figure (`combined_plots.pdf`) that includes:
    * Two plots from the "coinflips" example (illustrating coverage for MinSE with different eta values).
    * Three plots from the "toy regression model" example (showing individual conformal intervals and the result of stable selection).
* **Usage:**
    1.  Run the `generate_results` function within the notebook (or ensure `experiment_results.npy` from a previous run exists in the `examples/` directory). This step can be time-consuming as it runs the coinflips simulation.
    2.  Run the `main()` function in the notebook.
* **Output:**
    * `experiment_results.npy` (if not already present, stores results from the coinflips simulation).
    * `combined_plots.pdf`: The combined figure as shown in the paper (\Cref{fig:examples_synth_combined_figures}).
* **Helper script in `examples/`:**
    * `utils.py`: Contains JAX-based utility functions used in `combined_plots.ipynb` for MinSE (e.g., `solve_min_dot_simplex`, `jax_solve_linear_program`).
